Automating concept-drift detection by self-evaluating predictive model degradation

A key aspect of automating predictive machine learning entails the capability of properly triggering the update of the trained model. To this aim, suitable automatic solutions to self-assess the prediction quality and the data distribution drift between the original training set and the new data have to be devised. In this paper, we propose a novel methodology to automatically detect prediction-quality degradation of machine learning models due to class-based concept drift, i.e., when new data contains samples that do not fit the set of class labels known by the currently-trained predictive model. Experiments on synthetic and real-world public datasets show the effectiveness of the proposed methodology in automatically detecting and describing concept drift caused by changes in the class-label data distributions.

[1]  Manuel Roveri,et al.  Learning Discrete-Time Markov Chains Under Concept Drift , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[3]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[4]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[7]  A. Azzouz 2011 , 2020, City.

[8]  Abraham Bernstein,et al.  Entropy-based Concept Shift Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[10]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[11]  Xin Yao,et al.  A Systematic Study of Online Class Imbalance Learning With Concept Drift , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Daniele Apiletti,et al.  iSTEP, an Integrated Self-Tuning Engine for Predictive Maintenance in Industry 4.0 , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[13]  Tania Cerquitelli,et al.  Useful ToPIC: Self-Tuning Strategies to Enhance Latent Dirichlet Allocation , 2018, 2018 IEEE International Congress on Big Data (BigData Congress).

[14]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[15]  Elena Baralis,et al.  A New Unsupervised Predictive-Model Self-Assessment Approach That SCALEs , 2019, 2019 IEEE International Congress on Big Data (BigDataCongress).

[16]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[17]  Tania Cerquitelli,et al.  Self-tuning techniques for large scale cluster analysis on textual data collections , 2017, SAC.

[18]  Yu Sun,et al.  Concept Drift Adaptation by Exploiting Historical Knowledge , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[20]  Heeyoung Kim,et al.  A new metric of absolute percentage error for intermittent demand forecasts , 2016 .

[21]  Elena Baralis,et al.  SeLINA: A Self-Learning Insightful Network Analyzer , 2016, IEEE Transactions on Network and Service Management.