Anomaly Detection for an E-commerce Pricing System

Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approaches we developed and deployed for a large-scale online pricing system at Walmart. Our system detects anomalies both in batch and real-time streaming settings, and the items flagged are reviewed and actioned based on priority and business impact. We found that having the right architecture design was critical to facilitate model performance at scale, and business impact and speed were important factors influencing model selection, parameter choice, and prioritization in a production environment for a large-scale system. We conducted analyses on the performance of various approaches on a test set using real-world retail data and fully deployed our approach into production. We found that our approach was able to detect the most important anomalies with high precision.

[1]  Tomás Pevný,et al.  Loda: Lightweight on-line detector of anomalies , 2016, Machine Learning.

[2]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[3]  Chuan Sheng Foo,et al.  Efficient GAN-Based Anomaly Detection , 2018, ArXiv.

[4]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[5]  Clayton D. Scott,et al.  Robust kernel density estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Andrew W. Moore,et al.  Detecting anomalous patterns in pharmacy retail data , 2005 .

[7]  Yue Zhao,et al.  PyOD: A Python Toolbox for Scalable Outlier Detection , 2019, J. Mach. Learn. Res..

[8]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[9]  Miroslav Dudík,et al.  Hierarchical maximum entropy density estimation , 2007, ICML '07.

[10]  Dominique T. Shipmon,et al.  Time Series Anomaly Detection; Detection of anomalous drops with limited features and sparse examples in noisy highly periodic data , 2017, ArXiv.

[11]  Kate Smith-Miles,et al.  On normalization and algorithm selection for unsupervised outlier detection , 2019, Data Mining and Knowledge Discovery.

[12]  Yu Cheng,et al.  Deep Structured Energy Based Models for Anomaly Detection , 2016, ICML.

[13]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[14]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[15]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Nick S. Jones,et al.  Highly Comparative Feature-Based Time-Series Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Arun Kejariwal,et al.  A Novel Technique for Long-Term Anomaly Detection in the Cloud , 2014, HotCloud.

[19]  Saeed Amizadeh,et al.  Generic and Scalable Framework for Automated Time-series Anomaly Detection , 2015, KDD.

[20]  Shuchita Upadhyaya,et al.  Outlier Detection: Applications And Techniques , 2012 .

[21]  A. Azzouz 2011 , 2020, City.

[22]  Nikolay Laptev,et al.  Deep and Confident Prediction for Time Series at Uber , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[23]  Subutai Ahmad,et al.  Real-Time Anomaly Detection for Streaming Analytics , 2016, ArXiv.

[24]  Anlong Ming,et al.  EGMM: An enhanced Gaussian mixture model for detecting moving objects with intermittent stops , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[25]  Rob J. Hyndman,et al.  Large-Scale Unusual Time Series Detection , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[26]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[27]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Thomas G. Dietterich,et al.  Sequential Feature Explanations for Anomaly Detection , 2019, ACM Trans. Knowl. Discov. Data.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[32]  Tao Xu,et al.  Applying Deep Learning to Airbnb Search , 2018, KDD.

[33]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[34]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..