Domain adaptation bounds for multiple expert systems under concept drift

The ability to learn incrementally from streaming data - either in an online or batch setting - is of crucial importance for a prediction algorithm to learn from environments that generate vast amounts of data, where it is impractical or simply unfeasible to store all historical data. On the other hand, learning from streaming data becomes increasingly difficult when the probability distribution generating the data stream evolves over time, which renders the classification model generated from previously seen data suboptimal or potentially useless. Ensemble systems that employ multiple classifiers may be used to mitigate this effect, but even in such cases some classifiers (experts) become less knowledgeable for predicting on different domains than others as the distribution drifts. Further complication results when labeled data from a prediction (target) domain is not immediately available; hence, causing prediction on the target domain to yield sub-optimal results. In this work, we provide upper bounds on the loss, which hold with high probability, of a multiple expert system trained in such a nonstationary environment with verification latency. Furthermore, we show why a single model selection strategy can lead to undesirable results when learning in such nonstationary streaming settings. We present our analytical results with experiments on simulated as well as real-world data sets, comparing several different ensemble approaches to a single model.

[1]  Gregory Ditzler,et al.  Transductive learning algorithms for nonstationary environments , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[2]  Gregory Ditzler,et al.  Discounted expert weighting for concept drift , 2013, 2013 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE).

[3]  Robi Polikar,et al.  Incremental learning in nonstationary environments with controlled forgetting , 2009, 2009 International Joint Conference on Neural Networks.

[4]  Indre Zliobaite Expected Classification Error of the Euclidean Linear Classifier under Sudden Concept Drift , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[5]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[8]  J. Stefanowski,et al.  From Block-based Ensembles to Online Learners In Changing Data Streams : If-and How-To , 2012 .

[9]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[10]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[11]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[12]  Cesare Alippi,et al.  Change detection tests using the ICI rule , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[13]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[14]  Indre Zliobaite,et al.  Change with Delayed Labeling: When is it Detectable? , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[15]  Padraig Cunningham,et al.  A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering , 2006, FLAIRS.

[16]  Stephen Grossberg,et al.  Nonlinear neural networks: Principles, mechanisms, and architectures , 1988, Neural Networks.

[17]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[18]  Gregory Ditzler,et al.  Hellinger distance based drift detection for nonstationary environments , 2011, 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE).

[19]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[20]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[21]  Philip S. Yu,et al.  Classifying Data Streams with Skewed Class Distributions and Concept Drifts , 2008, IEEE Internet Computing.

[22]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[23]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[24]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[25]  Eric Eaton,et al.  Scalable Lifelong Learning with Active Task Selection , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[26]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[27]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[28]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[29]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[30]  Indre liobaite,et al.  Change with Delayed Labeling: When is it Detectable? , 2010, ICDM 2010.

[31]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[32]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[33]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[34]  Gregory Ditzler,et al.  Semi-supervised learning in nonstationary environments , 2011, The 2011 International Joint Conference on Neural Networks.

[35]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[36]  Robi Polikar,et al.  Active learning in nonstationary environments , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).