A Reservoir of Adaptive Algorithms for Online Learning from Evolving Data Streams

Continuous change and development are essential aspects of evolving environments and applications, including, but not limited to, smart cities, military, medicine, nuclear reactors, self-driving cars, aviation, and aerospace. That is, the fundamental characteristics of such environments may evolve, and so cause dangerous consequences, e.g., putting people lives at stake, if no reaction is adopted. Therefore, learning systems need to apply intelligent algorithms to monitor evolvement in their environments and update themselves effectively. Further, we may experience fluctuations regarding the performance of learning algorithms due to the nature of incoming data as it continuously evolves. That is, the current efficient learning approach may become deprecated after a change in data or environment. Hence, the question ‘how to have an efficient learning algorithm over time against evolving data? ’ has to be addressed. In this thesis, we have made two contributions to settle the challenges described above. In the machine learning literature, the phenomenon of (distributional) change in data is known as concept drift. Concept drift may shift decision boundaries, and cause a decline in accuracy. Learning algorithms, indeed, have to detect concept drift in evolving data streams and replace their predictive models accordingly. To address this challenge, adaptive learners have been devised which may utilize drift detection methods to locate the drift points in dynamic and changing data streams. A drift detection method able to discover the drift points quickly, with the lowest false positive and false negative rates, is preferred. False positive refers to incorrectly alarming for concept drift, and false negative refers to not alarming for concept drift. In this thesis, we introduce three algorithms, called as the Fast Hoeffding Drift Detection Method (FHDDM), the Stacking Fast Hoeffding Drift Detection Method (FHDDMS), and the McDiarmid Drift Detection Methods (MDDMs), for detecting drift points with the minimum delay, false positive, and false negative rates. FHDDM is a sliding window-based algorithm and applies Hoeffding’s inequality (Hoeffding, 1963) to detect concept drift. FHDDM slides its window over the prediction results, which are either 1 (for a correct prediction) or 0 (for a wrong prediction). Meanwhile, it compares the mean of elements inside the window with the maximum mean observed so far; subsequently, a significant difference between the two means, upper-bounded by the Hoeffding inequality, indicates the occurrence of concept drift. The FHDDMS extends the FHDDM algorithm by sliding multiple windows over its entries for a better drift detection regarding the detection delay and false negative rate. In contrast to FHDDM/S, the MDDM variants assign weights to their entries, i.e., higher weights are associated with the most recent entries in the sliding window, for faster detection of concept drift. The rationale is that recent examples reflect the ongoing situation adequately. Then, by putting higher weights on the latest entries, we may detect concept drift quickly. An MDDM algorithm bounds the difference between the weighted mean of elements in the sliding window and the maximum weighted mean seen so far, using McDiarmid’s inequality (McDiarmid, 1989). Eventually, it alarms for concept

[1]  S. W. Roberts Control chart tests based on geometric moving averages , 2000 .

[2]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[3]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[4]  Bhavani M. Thuraisingham Data mining for security applications , 2004, ICMLA.

[5]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[6]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[7]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[8]  Herna Viktor,et al.  McDiarmid Drift Detection Methods for Evolving Data Streams , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[9]  Yun Sing Koh,et al.  Detecting concept change in dynamic data streams , 2013, Machine Learning.

[10]  Rudolf Kruse,et al.  Enhancing Text Classification to Improve Information Filtering , 2001 .

[11]  Ivan Bratko,et al.  Machine Learning by Function Decomposition , 1997, ICML.

[12]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[13]  Olubukola Olaitan,et al.  SCUT-DS: Methodologies for Learning in Imbalanced Data Streams , 2018 .

[14]  Wagner Meira,et al.  Understanding temporal aspects in document classification , 2008, WSDM '08.

[15]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[16]  P. Cortez,et al.  A data mining approach to predict forest fires using meteorological data , 2007 .

[17]  Josef Kittler,et al.  Challenges and Research Directions for Adaptive Biometric Recognition Systems , 2009, ICB.

[18]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[19]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[20]  João Pedro Carvalho Leal Mendes Moreira,et al.  Travel time prediction for the planning of mass transit companies: a machine learning approach , 2008 .

[21]  Herna L. Viktor,et al.  Intelligent Adaptive Ensembles for Data Stream Mining: A High Return on Investment Approach , 2015, NFMCP.

[22]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[23]  Herna Viktor,et al.  Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams , 2017, Machine Learning.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[26]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[27]  Kuo-Wei Hsu,et al.  A Theoretical Analysis of Why Hybrid Ensembles Work , 2017, Comput. Intell. Neurosci..

[28]  Katarzyna Musial,et al.  Next challenges for adaptive learning systems , 2012, SKDD.

[29]  Mohamed Medhat Gaber,et al.  Pocket Data Mining , 2014 .

[30]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[31]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[32]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[33]  Herna L. Viktor,et al.  A Framework for Classification in Data Streams Using Multi-strategy Learning , 2016, DS.

[34]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[35]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[36]  Christoph Meinel,et al.  Deep Learning for Medical Image Analysis , 2018, Journal of Pathology Informatics.

[37]  Namsik Chang,et al.  Dynamics of Modeling in Data Mining: Interpretive Approach to Bankruptcy Prediction , 1999, J. Manag. Inf. Syst..

[38]  Seppo Puuronen,et al.  Comparing Classifier Combining Techniques for Mobile-Masquerader Detection , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[39]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[40]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[41]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[42]  Marcos Salganico,et al.  Tolerating concept and sampling shift in lazy learning using prediction error context switching , 1997 .

[43]  Roberto Souto Maior de Barros,et al.  RDDM: Reactive drift detection method , 2017, Expert Syst. Appl..

[44]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[45]  R. Giacomini,et al.  Detecting and Predicting Forecast Breakdowns , 2006, SSRN Electronic Journal.

[46]  Bhavani M. Thuraisingham,et al.  A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams , 2009, PAKDD.

[47]  Diane J. Cook,et al.  Keeping the Resident in the Loop: Adapting the Smart Home to the User , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[48]  João Bártolo Gomes,et al.  Where Will You Go? Mobile Data Mining for Next Place Prediction , 2013, DaWaK.

[49]  Natalia Stash,et al.  AHA! The adaptive hypermedia architecture , 2003, HYPERTEXT '03.

[50]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[51]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[52]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[53]  Gerhard Widmer,et al.  Effective Learning in Dynamic Environments by Explicit Context Tracking , 1993, ECML.

[54]  José del Campo-Ávila,et al.  Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds , 2015, IEEE Transactions on Knowledge and Data Engineering.

[55]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[56]  Jeffrey W. Seifert Data Mining and Homeland Security: An Overview , 2008 .

[57]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[58]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[59]  Geoff Hulten,et al.  Catching up with the Data: Research Issues in Mining Data Streams , 2001, DMKD.

[60]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[61]  Juan M. Corchado,et al.  Applying lazy learning algorithms to tackle concept drift in spam filtering , 2007, Expert Syst. Appl..

[62]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[63]  Lior Rokach,et al.  CHANGE DETECTION IN CLASSIFICATION MODELS INDUCED FROM TIME SERIES DATA , 2004 .

[64]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[65]  Geoff Holmes,et al.  Fast Perceptron Decision Tree Learning from Evolving Data Streams , 2010, PAKDD.

[66]  Barbara Caputo,et al.  Incremental learning for place recognition in dynamic environments , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67]  Kyriakos Mouratidis,et al.  Continuous Nearest Neighbor Queries over Sliding Windows , 2007 .

[68]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[69]  Peter A. Flach,et al.  Machine Learning - The Art and Science of Algorithms that Make Sense of Data , 2012 .

[70]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Detecting Concept Change in Streaming Data: Overview and Perspectives , 2008 .

[71]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[72]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[73]  Herna L. Viktor,et al.  Dynamic adaptation of online ensembles for drifting data streams , 2017, Journal of Intelligent Information Systems.

[74]  Andrea De Mauro,et al.  A formal definition of Big Data based on its essential features , 2016 .

[75]  Roberto Souto Maior de Barros,et al.  A large-scale comparison of concept drift detectors , 2018, Inf. Sci..

[76]  Roberto Souto Maior de Barros,et al.  A Lightweight Concept Drift Detection Ensemble , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[77]  Albert Bifet Classifier Concept Drift Detection and the Illusion of Progress , 2017, ICAISC.

[78]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[79]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[80]  Mohamed Medhat Gaber,et al.  Resource-aware Online Data Mining in Wireless Sensor Networks , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[81]  Constantinos S. Hilas,et al.  Designing an expert system for fraud detection in private telecommunications networks , 2009, Expert Syst. Appl..

[82]  Pedro M. Domingos,et al.  Mining massive data streams , 2005 .

[83]  Licia Capra,et al.  kNN CF: a temporal social network , 2008, RecSys '08.

[84]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[85]  Indre Zliobaite,et al.  Learning under Concept Drift: an Overview , 2010, ArXiv.

[86]  Andreas D. Lattner,et al.  Sequential Pattern Mining for Situation and Behavior Prediction in Simulated Robotic Soccer , 2005, RoboCup.

[87]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[88]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[89]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[90]  Antanas Verikas,et al.  Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: a survey , 2010, Soft Comput..

[91]  Herna L. Viktor,et al.  Fast Hoeffding Drift Detection Method for Evolving Data Streams , 2016, ECML/PKDD.

[92]  Michaela M. Black,et al.  Detecting and Adapting to Concept Drift in Bioinformatics , 2004, KELSI.

[93]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[94]  Heri Ramampiaro,et al.  High utility drift detection in quantitative data streams , 2018, Knowl. Based Syst..

[95]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[96]  Albert Bifet,et al.  DATA STREAM MINING A Practical Approach , 2009 .

[97]  Gerhard Widmer,et al.  Adapting to Drift in Continuous Domains , 2007 .

[98]  Miroslav Kubat,et al.  Association mining in time-varying domains , 2005, Intell. Data Anal..

[99]  Aldo Napoli,et al.  The automatic identification system of maritime accident risk using rule-based reasoning , 2012, 2012 7th International Conference on System of Systems Engineering (SoSE).

[100]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[101]  Sung-Bae Cho,et al.  Activity recognition based on wearable sensors using selection/fusion hybrid ensemble , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[102]  Jun Zhou,et al.  Prediction and Change Detection in Sequential Data for Interactive Applications , 2008, AAAI.

[103]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[104]  Ralf Klinkenberg Meta-Learning, Model Selection, and Example Selection in Machine Learning Domains with Concept Drift , 2005, LWA.

[105]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[106]  Michaela M. Black,et al.  Classification of Customer Call Data in the Presence of Concept Drift and Noise , 2002, Soft-Ware.

[107]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[108]  Marion G. Ceruti,et al.  Data Management Challenges and Development for Military Information Systems , 2003, IEEE Trans. Knowl. Data Eng..

[109]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[110]  Heri Ramampiaro,et al.  Applying temporal dependence to detect changes in streaming data , 2018, Applied Intelligence.

[112]  Heiko Wersing,et al.  Incremental on-line learning: A review and comparison of state of the art algorithms , 2018, Neurocomputing.

[113]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[114]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[115]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[116]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[117]  Ricard Gavaldà,et al.  Kalman Filters and Adaptive Windows for Learning in Data Streams , 2006, Discovery Science.

[118]  Albert Bifet,et al.  Adaptive learning and mining for data streams and frequent patterns , 2009, SKDD.

[119]  Smart Adaptive Systems : State of the Art and Future Directions of Research , 2001 .

[120]  Takaaki Ohishi,et al.  A Hybrid Ensemble Model Applied to the Short-Term Load Forecasting Problem , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[121]  Roberto Souto Maior de Barros,et al.  Wilcoxon Rank Sum Test Drift Detector , 2018, Neurocomputing.

[122]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[123]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[124]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[125]  Carlos Roberto Sanquetta,et al.  On the use of data mining for estimating carbon storage in the trees , 2013, Carbon Balance and Management.

[126]  João Gama,et al.  Fading histograms in detecting distribution and concept changes , 2017, International Journal of Data Science and Analytics.

[127]  Ke Shi,et al.  Data Mining Techniques for Wireless Sensor Networks: A Survey , 2013, Int. J. Distributed Sens. Networks.

[128]  Gregory Z. Grudic,et al.  Learning terrain segmentation with classifier ensembles for autonomous robot navigation in unstructured environments , 2009, J. Field Robotics.

[129]  Claude Sammut,et al.  Extracting Hidden Context , 1998, Machine Learning.

[130]  Mohamed Medhat Gaber,et al.  Data stream mining in ubiquitous environments: state‐of‐the‐art and current directions , 2014, WIREs Data Mining Knowl. Discov..

[131]  Indre Zliobaite,et al.  How good is the Electricity benchmark for evaluating concept drift adaptation , 2013, ArXiv.

[132]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[133]  Melita Hadzagic,et al.  Information Mining Technologies to Enable Discovery of Actionable Intelligence to Facilitate Maritime Situational Awareness: I-MINE , 2013 .

[134]  Christophe G. Giraud-Carrier,et al.  A Note on the Utility of Incremental Learning , 2000, AI Commun..

[135]  Darryl Charles,et al.  Player-Centred Game Design : Player Modelling and Adaptive Digital Games , 2005 .

[136]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[137]  Ingrid Renz,et al.  Adaptive Information Filtering : Learning Drifting Concepts , 1998 .

[138]  Gillian Dobbie,et al.  Detecting Volatility Shift in Data Streams , 2014, 2014 IEEE International Conference on Data Mining.

[139]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[140]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[141]  Ali A. Ghorbani,et al.  An incremental frequent structure mining framework for real-time alert correlation , 2009, Comput. Secur..

[142]  Marcin Budka,et al.  Towards cost-sensitive adaptation: When is it worth updating your predictive model? , 2015, Neurocomputing.

[143]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[144]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[145]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[146]  Yun Sing Koh,et al.  One Pass Concept Change Detection for Data Streams , 2013, PAKDD.

[147]  Gillian Dobbie,et al.  Drift Detection Using Stream Volatility , 2015, ECML/PKDD.

[148]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[149]  Julie Greensmith,et al.  Immune System Approaches to Intrusion Detection - A Review , 2004, ICARIS.

[150]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[151]  Roberto Souto Maior de Barros,et al.  Concept drift detection based on Fisher's Exact test , 2018, Inf. Sci..

[152]  Richard Weber,et al.  A methodology for dynamic data mining based on fuzzy clustering , 2005, Fuzzy Sets Syst..