Using Scanner Data for Consumer Price Index Compilation: Methodological Challenges and Empirical Evidence from E-commerce
Autor(ka) práce:
Jumaikhanova, Madina
Typ práce:
Diploma thesis
Vedoucí práce:
Musil, Petr
Oponenti práce:
-
Jazyk práce:
English
Abstrakt:
This thesis examines the methodological implications of integrating scanner data into the compilation of Consumer Price Indices (CPI), with a specific focus on the empirical performance of alternative index number formulas within the e-commerce sector. Utilizing granular, transaction-level data sourced from two distinct retailers across selected Information and Communication (COICOP 08.1.x) subgroups, this research constructs and evaluates a series of bilateral indices specifically the Laspeyres, Paasche, Fisher, and Jevons covering the period from December 2024 to November 2025. The primary objective is to determine how formula selection influences measured inflation dynamics and to assess the robustness of classical index number theory when applied to high-frequency, "big data" environments. The empirical findings demonstrate that index behavior is highly sensitive to the structural characteristics of specific product subgroups. In relatively stable categories, such as 08.1.3 and 08.1.5, the indices exhibit high degrees of convergence, aligning with traditional theoretical expectations. Conversely, in more volatile subgroups notably 08.1.2 and 08.1.4 significant divergence occurs, characterized by pronounced price declines and deviations from standard economic relationships. Cross-retailer comparisons reveal that idiosyncratic factors, including heterogeneous pricing strategies, high product turnover (churning), and shifting consumption patterns, fundamentally alter index outcomes even within identical COICOP categories. While the unweighted Jevons index shows heightened sensitivity to price dispersion, the Fisher Ideal index provides a mathematically superior and balanced approximation by mitigating the inherent biases of the Laspeyres and Paasche forms. Furthermore, this study evaluates the empirical validity of the Bortkiewicz inequality as a theoretical benchmark for the relationship between weighted and unweighted indices. The results indicate that while the inequality holds in stable market conditions, it is frequently violated in dynamic e-commerce settings marked by high price variability and rapid structural shifts in consumption. These deviations highlight the limitations of traditional axiomatic assumptions in the context of scanner data. Ultimately, this thesis contributes to the modernization of official statistics by providing evidence-based insights into the methodological rigorousness required to capture inflation accurately in the digital retail landscape.
Klíčová slova:
Scanner Data; E-commerce; Index Number Theory; Multilateral Methods; COICOP; Bortkiewicz Inequality; Consumer Price Index
Název práce:
Using Scanner Data for Consumer Price Index Compilation: Methodological Challenges and Empirical Evidence from E-commerce
Autor(ka) práce:
Jumaikhanova, Madina
Typ práce:
Diplomová práce
Vedoucí práce:
Musil, Petr
Oponenti práce:
-
Jazyk práce:
English
Abstrakt:
This thesis examines the methodological implications of integrating scanner data into the compilation of Consumer Price Indices (CPI), with a specific focus on the empirical performance of alternative index number formulas within the e-commerce sector. Utilizing granular, transaction-level data sourced from two distinct retailers across selected Information and Communication (COICOP 08.1.x) subgroups, this research constructs and evaluates a series of bilateral indices specifically the Laspeyres, Paasche, Fisher, and Jevons covering the period from December 2024 to November 2025. The primary objective is to determine how formula selection influences measured inflation dynamics and to assess the robustness of classical index number theory when applied to high-frequency, "big data" environments. The empirical findings demonstrate that index behavior is highly sensitive to the structural characteristics of specific product subgroups. In relatively stable categories, such as 08.1.3 and 08.1.5, the indices exhibit high degrees of convergence, aligning with traditional theoretical expectations. Conversely, in more volatile subgroups notably 08.1.2 and 08.1.4 significant divergence occurs, characterized by pronounced price declines and deviations from standard economic relationships. Cross-retailer comparisons reveal that idiosyncratic factors, including heterogeneous pricing strategies, high product turnover (churning), and shifting consumption patterns, fundamentally alter index outcomes even within identical COICOP categories. While the unweighted Jevons index shows heightened sensitivity to price dispersion, the Fisher Ideal index provides a mathematically superior and balanced approximation by mitigating the inherent biases of the Laspeyres and Paasche forms. Furthermore, this study evaluates the empirical validity of the Bortkiewicz inequality as a theoretical benchmark for the relationship between weighted and unweighted indices. The results indicate that while the inequality holds in stable market conditions, it is frequently violated in dynamic e-commerce settings marked by high price variability and rapid structural shifts in consumption. These deviations highlight the limitations of traditional axiomatic assumptions in the context of scanner data. Ultimately, this thesis contributes to the modernization of official statistics by providing evidence-based insights into the methodological rigorousness required to capture inflation accurately in the digital retail landscape.
Klíčová slova:
E-commerce; Index Number Theory; Multilateral Methods; Consumer Price Index; Scanner Data; COICOP; Bortkiewicz Inequality