Statistical Mechanics of On-Line Learning Under Concept Drift

We introduce a modeling framework for the investigation of on-line machine learning processes in non-stationary environments. We exemplify the approach in terms of two specific model situations: In the first, we consider the learning of a classification scheme from clustered data by means of prototype-based Learning Vector Quantization (LVQ). In the second, we study the training of layered neural networks with sigmoidal activations for the purpose of regression. In both cases, the target, i.e., the classification or regression scheme, is considered to change continuously while the system is trained from a stream of labeled data. We extend and apply methods borrowed from statistical physics which have been used frequently for the exact description of training dynamics in stationary environments. Extensions of the approach allow for the computation of typical learning curves in the presence of concept drift in a variety of model situations. First results are presented and discussed for stochastic drift processes in classification and regression problems. They indicate that LVQ is capable of tracking a classification scheme under drift to a non-trivial extent. Furthermore, we show that concept drift can cause the persistence of sub-optimal plateau states in gradient based training of layered neural networks for regression.

[1]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[2]  A. Burls Critical appraisal , 2016, Australasian psychiatry : bulletin of Royal Australian and New Zealand College of Psychiatrists.

[3]  Gianmarco De Francisci Morales,et al.  SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..

[4]  Pablo A. Estévez,et al.  A review of learning vector quantization classifiers , 2013, Neural Computing and Applications.

[5]  Heiko Wersing,et al.  Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM) , 2017, Knowledge and Information Systems.

[6]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[7]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[8]  Michael Biehl,et al.  Learning Vector Quantization: generalization ability and dynamics of competing prototypes , 2007, Similarity-based Clustering and its Application to Medicine and Biology.

[9]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[10]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[11]  Julius-Maximilians-Uni,et al.  Learning drifting concepts with neural networks , 1992 .

[12]  O. Kinouchi,et al.  Lower bounds on generalization errors for drifting rules , 1993 .

[13]  Heiko Wersing,et al.  Combining offline and online classifiers for life-long learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[14]  Mykola Pechenizkiy,et al.  An Overview of Concept Drift Applications , 2016 .

[15]  András A. Benczúr,et al.  Online Machine Learning in Big Data Streams , 2018, Encyclopedia of Big Data Technologies.

[16]  Sompolinsky,et al.  Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[17]  Heiko Wersing,et al.  Online metric learning for an adaptation to confidence drift , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[18]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[19]  Michael Biehl,et al.  Phase transitions in soft-committee machines , 1998, cond-mat/9805182.

[20]  P. R. Deshmukh,et al.  METHODS FOR INCREMENTAL LEARNING: A SURVEY , 2013 .

[21]  H. Seung,et al.  Scaling Laws in Learning of Classification Tasks 17 MAY 1993 , .

[22]  Nestor Caticha,et al.  Statistical Mechanics of Online Learning of Drifting Concepts: A Variational Approach , 2004, Machine Learning.

[23]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[24]  Michael Biehl,et al.  On-line backpropagation in two-layered neural networks , 1995 .

[25]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[26]  Michael Biehl,et al.  The dynamics of on-line principal component analysis , 1998 .

[27]  Parag Kulkarni,et al.  Incremental Learning: Areas and Methods - A Survey , 2012 .

[28]  Saad,et al.  On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[29]  Michael Biehl,et al.  Learning by on-line gradient descent , 1995 .

[30]  Michael Biehl,et al.  Statistical Mechanics of On{line Learning and Generalization the Handbook of Brain Theory and Neural Networks , 2003 .

[31]  Michael Biehl,et al.  Dynamical analysis of LVQ type learning rules , 2005 .

[32]  Michael Biehl,et al.  Statistical Mechanics of On-line Learning , 2009, Similarity-Based Clustering.

[33]  Michael Biehl,et al.  Window-Based Example Selection in Learning Vector Quantization , 2010, Neural Computation.

[34]  Michael Biehl,et al.  Supervised Learning from Clustered Input Examples , 1995 .

[35]  David Saad,et al.  Learning with Noise and Regularizers in Multilayer Neural Networks , 1996, NIPS.

[36]  Hyeyoung Park,et al.  On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units : Steepest Gradient Descent and Natural Gradient Descent , 2002, cond-mat/0212006.

[37]  Michael Biehl,et al.  On-Line Learning of a Time-Dependent Rule , 1992 .

[38]  Heiko Wersing,et al.  Incremental on-line learning: A review and comparison of state of the art algorithms , 2018, Neurocomputing.

[39]  Michael Biehl,et al.  Performance analysis of LVQ algorithms: A statistical physics approach , 2006, Neural Networks.

[40]  Christophe Marsala,et al.  Classification with a reject option under Concept Drift: The Droplets algorithm , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[41]  Saad,et al.  Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.

[42]  Nicolai Petkov,et al.  Brain-Inspired Computing , 2015, Lecture Notes in Computer Science.

[43]  R. Urbanczik,et al.  SELF-AVERAGING AND ON-LINE LEARNING , 1998, cond-mat/9805339.

[44]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[45]  Nathalie Japkowicz,et al.  Big Data Analysis: New Algorithms for a New Society , 2015 .

[46]  Michael Biehl,et al.  Specialization processes in on-line unsupervised learning , 1998 .

[47]  XuanLong Nguyen,et al.  Stochastic gradient based extreme learning machines for stable online learning of advanced combustion engines , 2016, Neurocomputing.

[48]  Michael Biehl,et al.  Statistical physics and practical training of soft-committee machines , 1999 .

[49]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[50]  Nestor Caticha,et al.  Functional optimization of online algorithms in multilayer neural networks , 1997 .

[51]  Michael Biehl,et al.  Dynamics of on-line competitive learning , 1997 .

[52]  Heike Freud,et al.  On Line Learning In Neural Networks , 2016 .

[53]  Michael Biehl,et al.  Dynamics and Generalization Ability of LVQ Algorithms , 2007, J. Mach. Learn. Res..

[54]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Novelty detection in data streams , 2015, Artificial Intelligence Review.

[55]  Michael Biehl,et al.  The dynamics of Learning Vector Quantization , 2005, ESANN.

[56]  Ade R.R,et al.  Methods for Incremental Learning : A Survey , 2013 .

[57]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[58]  Marc Mézard,et al.  Solvable models of working memories , 1986 .

[59]  Sompolinsky,et al.  Scaling laws in learning of classification tasks. , 1993, Physical review letters.

[60]  Michael Biehl,et al.  Transient dynamics of on-line learning in two-layered neural networks , 1996 .

[61]  Heiko Wersing,et al.  Mitigating Concept Drift via Rejection , 2018, ICANN.

[62]  Teuvo Kohonen,et al.  Improved versions of learning vector quantization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[63]  M. Rattray,et al.  Learning with regularizers in multilayer neural networks , 1998 .

[64]  Thomas Villmann,et al.  Distance Measures for Prototype Based Classification , 2013, BrainComp.

[65]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[66]  Ron Meir Empirical Risk Minimization versus Maximum-Likelihood Estimation: A Case Study , 1995, Neural Computation.

[67]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[68]  Thomas Villmann,et al.  Prototype-based models in machine learning. , 2016, Wiley interdisciplinary reviews. Cognitive science.