Outlier detection as the primary step for promotion planning in retail

Forecasting the increase of customer demand during discount promotions is a fundamental business assignment in retail, which can nowadays be performed by sophisticated data mining algorithms. The calculations are based on data gained during previous promotions. Apart from choosing the right mining algorithm, the quality of prediction models strongly depends on the quality of the training data. Outliers are points of data that do not conform to a defined notion of normal behavior and are therefore either excluded from the training set, or their impact on the model is weighted in a manner different from other data. In this paper we propose a new approach to outlier analysis, with the aim of distinguishing between the outliers associated with an outlier-generating store or product and the outliers that can be classified as noisy data. Outlier analysis is performed with a multidimensional perception of the dataset typical for data warehousing and OLAP. We also introduce measures that estimate the probability of store or product being an outlier generator and conduct experiments to determine their critical threshold values.

[1]  Jian Pei,et al.  Multi-level relationship outlier detection , 2012, Int. J. Bus. Intell. Data Min..

[2]  Thomas S. Ferguson,et al.  On the Rejection of Outliers , 1961 .

[3]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[4]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[5]  S. Johansen,et al.  Outlier detection in regression using an iterated one-step approximation to the huber-skip estimator , 2013 .

[6]  A. Madansky Identification of Outliers , 1988 .

[7]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[8]  E. Soofi,et al.  Rare, outlier and extreme: beyond the Gaussian model and measures , 2012 .

[9]  W. Chan Outlier Analysis of Annual Retail Price Inflation:A Cross-Country Study , 1998 .

[10]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[11]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[12]  M. Mulry,et al.  TREATING INFLUENTIAL VALUES IN A MONTHLY RETAIL TRADE SURVEY , 2007 .

[13]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[14]  Wenjie Hu,et al.  Robust Anomaly Detection Using Support Vector Machines , 2003 .

[15]  Evaluating an Alternative Data Source for Editing MEPS Drug Prices , 2012 .

[16]  Zoran Skocir,et al.  The impact of training data tailoring on demand forecasting models in retail , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[17]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[18]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[19]  Francesco Tajani,et al.  Least median of squares regression and minimum volume ellipsoid estimator for outliers detection in housing appraisal , 2014, Int. J. Bus. Intell. Data Min..

[20]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[21]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[22]  Yumin Chen,et al.  Neighborhood outlier detection , 2010, Expert Syst. Appl..

[23]  Lydia Boudjeloud Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems , 2012, Int. J. Bio Inspired Comput..

[24]  Philip Hans Franses,et al.  Outlier robust analysis of long-run marketing effects for weekly scanning data , 1998 .

[25]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[26]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[27]  Sameer Singh,et al.  An approach to novelty detection applied to the classification of image regions , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28]  J. P. Park The Identification Of Multiple Outliers , 2000 .

[29]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .