论文信息 - Automating concept-drift detection by self-evaluating predictive model degradation

Automating concept-drift detection by self-evaluating predictive model degradation

A key aspect of automating predictive machine learning entails the capability of properly triggering the update of the trained model. To this aim, suitable automatic solutions to self-assess the prediction quality and the data distribution drift between the original training set and the new data have to be devised. In this paper, we propose a novel methodology to automatically detect prediction-quality degradation of machine learning models due to class-based concept drift, i.e., when new data contains samples that do not fit the set of class labels known by the currently-trained predictive model. Experiments on synthetic and real-world public datasets show the effectiveness of the proposed methodology in automatically detecting and describing concept drift caused by changes in the class-label data distributions.

[1] Manuel Roveri,et al. Learning Discrete-Time Markov Chains Under Concept Drift , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2] Thorsten Joachims,et al. Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[3] Alexey Tsymbal,et al. The problem of concept drift: definitions and related work , 2004 .

[4] P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[5] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6] Ricard Gavaldà,et al. Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[7] A. Azzouz. 2011 , 2020, City.

[8] Abraham Bernstein,et al. Entropy-based Concept Shift Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9] João Gama,et al. A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[10] Francisco Herrera,et al. A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[11] Xin Yao,et al. A Systematic Study of Online Class Imbalance Learning With Concept Drift , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[12] Daniele Apiletti,et al. iSTEP, an Integrated Self-Tuning Engine for Predictive Maintenance in Industry 4.0 , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[13] Tania Cerquitelli,et al. Useful ToPIC: Self-Tuning Strategies to Enhance Latent Dirichlet Allocation , 2018, 2018 IEEE International Congress on Big Data (BigData Congress).

[14] Vipin Kumar,et al. Introduction to Data Mining, (First Edition) , 2005 .

[15] Elena Baralis,et al. A New Unsupervised Predictive-Model Self-Assessment Approach That SCALEs , 2019, 2019 IEEE International Congress on Big Data (BigDataCongress).

[16] Timothy Baldwin,et al. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[17] Tania Cerquitelli,et al. Self-tuning techniques for large scale cluster analysis on textual data collections , 2017, SAC.

[18] Yu Sun,et al. Concept Drift Adaptation by Exploiting Historical Knowledge , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[19] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[20] Heeyoung Kim,et al. A new metric of absolute percentage error for intermittent demand forecasts , 2016 .

[21] Elena Baralis,et al. SeLINA: A Self-Learning Insightful Network Analyzer , 2016, IEEE Transactions on Network and Service Management.