Scalable probabilistic forecasting in retail with gradient boosted trees: A practitioner’s approach

Long, Xueying; Bui, Quang; Oktavian, Grady; F. Schmidt, Daniel; Bergmeir, Christoph Norbert; Godahewa, Rakshitha; Per Lee, Seong; Zhao, Kaifeng; Condylis, Paul

doi:10.1016/j.ijpe.2024.109449

1-s2.0-S0925527324003062-main.pdf (1.354Mb)

Identificadores

URI: https://hdl.handle.net/10481/97464

DOI: 10.1016/j.ijpe.2024.109449

Exportar

Editorial

Elsevier

Materia

Probabilistic forecasting

Gradient boosted trees

Global models

Fecha

2024-11-12

Referencia bibliográfica

Long, X. et. al. Int. J. Production Economics 279 (2025) 109449. [https://doi.org/10.1016/j.ijpe.2024.109449]

Patrocinador

María Zambrano (Senior)Fellowship by the Spanish Ministry of Universities; Next Generation funds from the European Union

Resumen

The recent M5 competition has advanced the state-of-the-art in retail forecasting. However, there are important differences between the competition challenge and the challenges we face in a large e-commerce company. The datasets in our scenario are larger (hundreds of thousands of time series), and e-commerce can afford to have a larger stock assortment than brick-and-mortar retailers, leading to more intermittent data. To scale to larger dataset sizes with feasible computational effort, we investigate a two-layer hierarchy, namely the decision level with product unit sales and an aggregated level, e.g., through warehouse-product aggregation, reducing the number of series and degree of intermittency. We propose a top-down approach to forecasting at the aggregated level, and then disaggregate to obtain decision-level forecasts. Probabilistic forecasts are generated under distributional assumptions. The proposed scalable method is evaluated on both a large proprietary dataset, as well as the publicly available Corporación Favorita and M5 datasets. We are able to show the differences in characteristics of the e-commerce and brick-and-mortar retail datasets. Notably, our top-down forecasting framework enters the top 50 of the original M5 competition, even with models trained at a higher level under a much simpler setting.

Colecciones

OpenAIRE (Open Access Infrastructure for Research in Europe)

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional