Data stream classification using active learned neural networks

Abstract Due to variety of modern real-life tasks, where analyzed data is often not a static set, the data stream mining gained a substantial focus of machine learning community. Main property of such systems is the large amount of data arriving in a sequential manner, which creates an endless stream of objects. Taking into consideration the limited resources as memory and computational power, it is widely accepted that each instance can be processed up once and it is not remembered, making reevaluation impossible. In the following work, we will focus on the data stream classification task where parameters of a classification model may vary over time, so the model should be able to adapt to the changes. It requires a forgetting mechanism, ensuring that outdated samples will not impact a model. The most popular approaches base on so-called windowing, requiring storage of a batch of objects and when new examples arrive, the least relevant ones are forgotten. Objects in a new window are used to retrain the model, which is cumbersome especially for online learners and contradicts the principle of processing each object at most once. Therefore, this work employs inbuilt forgetting mechanism of neural networks. Additionally, to reduce a need of expensive (sometimes even impossible) object labeling, we are focusing on active learning, which asks for labels only for interesting examples, crucial for appropriate model upgrading. Characteristics of proposed methods were evaluated on the basis of the computer experiments, performed over diverse pool of data streams. Their results confirmed the convenience of proposed strategy.

[1]  Indre Zliobaite,et al.  How good is the Electricity benchmark for evaluating concept drift adaptation , 2013, ArXiv.

[2]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[3]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[4]  Abdelhamid Bouchachia,et al.  GT2FC: An Online Growing Interval Type-2 Self-Learning Fuzzy Classifier , 2014, IEEE Transactions on Fuzzy Systems.

[5]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[6]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[7]  Michaela M. Black,et al.  Classification of Customer Call Data in the Presence of Concept Drift and Noise , 2002, Soft-Ware.

[8]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[9]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[10]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[11]  Michal Wozniak,et al.  Active learning approach to concept drift problem , 2012, Log. J. IGPL.

[12]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Michal Wozniak,et al.  Learning Curve in Concept Drift While Using Active Learning Paradigm , 2011, ICAIS.

[15]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[16]  Michal Wozniak,et al.  Concept Drift Detection and Model Selection with Simulated Recurrence and Ensembles of Statistical Detectors , 2013, J. Univers. Comput. Sci..

[17]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[18]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[19]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[20]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[21]  Michal Wozniak,et al.  Application of Combined Classifiers to Data Stream Classification , 2013, CISIM.

[22]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[23]  Cesare Alippi,et al.  Credit card fraud detection and concept-drift adaptation with delayed supervised information , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[24]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[25]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[26]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[27]  Moamar Sayed Mouchaweh,et al.  A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams , 2018, IEEE Trans. Neural Networks Learn. Syst..

[28]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[29]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[30]  David G. Stork,et al.  Pattern Classification , 1973 .

[31]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[32]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[33]  Marie Persson,et al.  Improved concept drift handling in surgery prediction and other applications , 2015, Knowledge and Information Systems.

[34]  Bartosz Krawczyk,et al.  Incremental Learning and Forgetting in One-Class Classifiers for Data Streams , 2013, CORES.

[35]  Aryan Mokhtari,et al.  Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..

[36]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[37]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[38]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[39]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .