Image credit: Unsplash Data Science Summit 2018

XGBoost as a time-series forecasting tool

Abstract

The goal of this presentation and associated paper is to present results of investigation related to use of the Extreme Gradient Boosting XGBoost algorithm as a forecasting tool. The data provided by the Rossman Com-pany, with a request to design an innovative prediction method, has been used as a base for this case study. The data contains details about micro- and macro-environment, as well as turnover of 1115 stores. Performance of the algorithm was compared to classical forecasting models SARIMAX and Holt-Winters, using time-series cross validation and tests for statistical importance in prediction quality dif-ferences. Metrics of root mean squared percentage error (RMSPE), Theil’s coeffi-cient and adjusted correlation coefficient were analyzed. Results where then passed to Rossman for verification on a separate validation set, via Kaggle.com platform. Study results confirmed, that XGBoost, after using proper data preparation and training method, achieves better results than classical models.

Date
Event
Location
Warsaw, Poland

Presentation

Avatar
Filip Wójcik
Senior Data Scientist and PhD candidate

Data scientist and University researcher, passionate of machine learning and statistical analysis. In the same time - experienced software developer with experience in different technologies (from .NET to open-source).