Clustering to forecast sparse time-series data

Forecasting accurately is essential to successful inventory planning in retail. Unfortunately, there is not always enough historical data to forecast items individually- this is particularly true in e-commerce where there is a long tail of low selling items, and items are introduced and phased out quite frequently, unlike physical stores. In such scenarios, it is preferable to forecast items in well-designed groups of similar items, so that data for different items can be pooled together to fit a single model. In this paper, we first discuss the desiderata for such a grouping and how it differs from the traditional clustering problem. We then describe our approach which is a scalable local search heuristic that can naturally handle the constraints required in this setting, besides being capable of producing solutions competitive with well-known clustering algorithms. We also address the complementary problem of estimating similarity, particularly in the case of new items which have no past sales. Our solution is to regress the sales profile of items against their semantic features, so that given just the semantic features of a new item we can predict its relation to other items, in terms of as yet unobserved sales. Our experiments demonstrate both the scalability of our approach and implications for forecast accuracy.

[1]  Dino Pedreschi,et al.  DEMON: a local-first discovery method for overlapping communities , 2012, KDD.

[2]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[3]  Cheng Hsiao,et al.  Analysis of Panel Data , 1987 .

[4]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[5]  Badi H. Baltagi,et al.  To Pool or Not to Pool: Homogeneous Versus Heterogeneous Estimators Applied to Cigarette Demand , 2000, Review of Economics and Statistics.

[6]  George Kapetanios,et al.  Cluster analysis of panel data sets using non-standard optimisation of information criteria , 2006 .

[7]  Inderjit S. Dhillon,et al.  Overlapping community detection using seed set expansion , 2013, CIKM.

[8]  D. Piccolo A DISTANCE MEASURE FOR CLASSIFYING ARIMA MODELS , 1990 .

[9]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[10]  J. Geweke,et al.  Bayesian Forecasting , 2004 .

[11]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[12]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[14]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[15]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[16]  Mahesh Kumar,et al.  Clustering seasonality patterns in the presence of errors , 2002, KDD.

[17]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[18]  Inderjit S. Dhillon,et al.  Iterative clustering of high dimensional text data augmented by local search , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[20]  S. S. Wilks,et al.  The Advanced Theory of Statistics. I. Distribution Theory , 1959 .

[21]  Rob J Hyndman,et al.  25 years of time series forecasting , 2006 .

[22]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[23]  C. Sims MACROECONOMICS AND REALITY , 1977 .

[24]  Paul H. Zipkin,et al.  Foundations of Inventory Management , 2000 .

[25]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[26]  N. Metropolis,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2017 .

[27]  Joachim Selbig,et al.  pcaMethods - a bioconductor package providing PCA methods for incomplete data , 2007, Bioinform..

[28]  M. Kendall,et al.  The Advanced Theory of Statistics, Vol. 1: Distribution Theory , 1959 .

[29]  Farshid Vahid Clustering Regression Functions in a Panel , 2000 .

[30]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[31]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[32]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[33]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[34]  Badi H. Baltagi,et al.  Pooled estimators vs. their heterogeneous counterparts in the context of dynamic demand for gasoline , 1997 .

[35]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[36]  Elizabeth Ann Maharaj,et al.  Cluster of Time Series , 2000, J. Classif..

[37]  Augusto Y. Hermosilla,et al.  Clustering Panel Data via perturbed Adaptive Simulated Annealing and Genetic Algorithms , 2002, Adv. Complex Syst..

[38]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[39]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[40]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.