Discussion and review on evolving data streams and concept drift adapting

Recent advances in computational intelligent systems have focused on addressing complex problems related to the dynamicity of the environments. In increasing number of real world applications, data are presented as streams that may evolve over time and this is known by concept drift. Handling concept drift is becoming an attractive topic of research that concerns multidisciplinary domains such that machine learning, data mining, ubiquitous knowledge discovery, statistic decision theory, etc... Therefore, a rich body of the literature has been devoted to the study of methods and techniques for handling drifting data. However, this literature is fairly dispersed and it does not define guidelines for choosing an appropriate approach for a given application. Hence, the main objective of this survey is to present an ease understanding of the concept drift issues and related works, in order to help researchers from different disciplines to consider concept drift handling in their applications. This survey covers different facets of existing approaches, evokes discussion and helps readers to underline the sharp criteria that allow them to properly design their own approach. For this purpose, a new categorization of the existing state-of-the-art is presented with criticisms, future tendencies and not-yet-addressed challenges.

[1]  Edwin Lughofer,et al.  Self-adaptive and local strategies for a smooth treatment of drifts in data streams , 2014, Evol. Syst..

[2]  João Gama,et al.  Learning with Local Drift Detection , 2006, ADMA.

[3]  Bärbel Mertsching,et al.  Region-Based Artificial Visual Attention in Space and Time , 2013, Cognitive Computation.

[4]  Nitesh V. Chawla,et al.  Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams , 2009, PAKDD Workshops.

[5]  Žliobait . e,et al.  Learning under Concept Drift: an Overview , 2010 .

[6]  Khaled Ghédira,et al.  Self-Adaptive Windowing Approach for Handling Complex Concept Drift , 2015, Cognitive Computation.

[7]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[8]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments: Methods and Applications , 2012 .

[9]  Gregory Ditzler,et al.  Hellinger distance based drift detection for nonstationary environments , 2011, 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE).

[10]  Plamen Angelov,et al.  Autonomous Learning Systems: From Data Streams to Knowledge in Real-time , 2013 .

[11]  Davide Fossati,et al.  Affect detection from non-stationary physiological data using ensemble classifiers , 2014, Evolving Systems.

[12]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[13]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[14]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[15]  Amparo Alonso-Betanzos,et al.  Stream change detection via passive-aggressive classification and Bernoulli CUSUM , 2015, Inf. Sci..

[16]  Dang-Hoan Tran Automated Change Detection and Reactive Clustering in Multivariate Streaming Data , 2019, 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF).

[17]  Plamen Angelov,et al.  Evolving Intelligent Systems: Methodology and Applications , 2010 .

[18]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[19]  Geoff Holmes,et al.  Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them , 2013, ECML/PKDD.

[20]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[21]  Edwin Lughofer,et al.  Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances , 2016, Inf. Sci..

[22]  Moamar Sayed Mouchaweh,et al.  Hybrid dynamic data-driven approach for drift-like fault detection in wind turbines , 2015, Evol. Syst..

[23]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[24]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[25]  Stefan Schliebs,et al.  Evolving spiking neural network—a survey , 2013, Evolving Systems.

[26]  Alexey Tsymbal,et al.  Bagging and Boosting with Dynamic Integration of Classifiers , 2000, PKDD.

[27]  Konrad Jackowski,et al.  Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers , 2013, Pattern Analysis and Applications.

[28]  Mykola Pechenizkiy,et al.  Handling Local Concept Drift with Dynamic Integration of Classifiers: Domain of Antibiotic Resistance in Nosocomial Infections , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[29]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[30]  Indre Zliobaite,et al.  Combining Time and Space Similarity for Small Size Learning under Concept Drift , 2009, ISMIS.

[31]  Zhonghua Li,et al.  Adaptive CUSUM control chart with variable sampling intervals , 2009, Comput. Stat. Data Anal..

[32]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments , 2012 .

[33]  Niall M. Adams,et al.  Two Nonparametric Control Charts for Detecting Arbitrary Distribution Changes , 2012 .

[34]  João Gama,et al.  Real-time algorithm for changes detection in depth of anesthesia signals , 2013, Evol. Syst..

[35]  Luigi Barone,et al.  Nature-Inspired Techniques in the Context of Fraud Detection , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[36]  Cesare Alippi,et al.  Change detection tests using the ICI rule , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[37]  Cesare Alippi,et al.  Just-in-Time Adaptive Classifiers—Part I: Detecting Nonstationary Changes , 2008, IEEE Transactions on Neural Networks.

[38]  Raymond Y. K. Lau,et al.  Dynamic Clustering Forest: An ensemble framework to efficiently classify textual data stream with concept drift , 2016, Inf. Sci..

[39]  Ludmila I. Kuncheva,et al.  Determining the Training Window for Small Sample Size Classification with Concept Drift , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[40]  Roberto Souto Maior de Barros,et al.  RCD: A recurring concept drift framework , 2013, Pattern Recognit. Lett..

[41]  Hojjat Adeli,et al.  Concept Drift-Oriented Adaptive and Dynamic Support Vector Machine Ensemble With Time Window in Corporate Financial Risk Prediction , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[42]  G. S. Mahalakshmi,et al.  Twitter Sentiment Analysis for Large-Scale Data: An Unsupervised Approach , 2014, Cognitive Computation.

[43]  Abraham Bernstein,et al.  Entropy-based Concept Shift Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[44]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[45]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[46]  Matjaz Kukar,et al.  Drifting Concepts as Hidden Factors in Clinical Studies , 2003, AIME.

[47]  Michal Wozniak,et al.  Concept Drift Detection and Model Selection with Simulated Recurrence and Ensembles of Statistical Detectors , 2013, J. Univers. Comput. Sci..

[48]  Xindong Wu,et al.  Active Learning through Adaptive Heterogeneous Ensembling , 2015, IEEE Transactions on Knowledge and Data Engineering.

[49]  João Gama,et al.  Incremental discretization, application to data with concept drift , 2007, SAC '07.

[50]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[51]  Ludmila I. Kuncheva,et al.  On the window size for classification in changing environments , 2009, Intell. Data Anal..

[52]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[53]  Geoff Holmes,et al.  Evaluation methods and decision theory for classification of streaming data with temporal dependence , 2015, Machine Learning.

[54]  Moamar Sayed Mouchaweh,et al.  Drift detection and monitoring in non-stationary environments , 2014, 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[55]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[56]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[57]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[58]  Anton Dries,et al.  Adaptive concept drift detection , 2009 .

[59]  Jerzy Stefanowski,et al.  Combining block-based and online methods in learning ensembles from concept drifting data streams , 2014, Inf. Sci..

[60]  David A. Cieslak,et al.  A framework for monitoring classifiers’ performance: when and why failure occurs? , 2009, Knowledge and Information Systems.

[61]  S. Muthukrishnan,et al.  Sequential Change Detection on Data Streams , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[62]  Mohamed Medhat Gaber,et al.  Knowledge discovery from data streams , 2009, IDA 2009.

[63]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[64]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[65]  Geoff Holmes,et al.  Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking , 2010, ACML.

[66]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[67]  Anton Dries,et al.  Adaptive concept drift detection , 2009, SDM.

[68]  Mohamed Limam,et al.  An ensemble method for concept drift in nonstationary environment , 2013 .

[69]  Ming-Syan Chen,et al.  Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[70]  Abdelhamid Bouchachia,et al.  A review of smart homes in healthcare , 2015, J. Ambient Intell. Humaniz. Comput..

[71]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[72]  Adel Aloraini,et al.  Penalized ensemble feature selection methods for hidden associations in time series environments case study: equities companies in Saudi Stock Exchange Market , 2015, Evol. Syst..

[73]  Ismael Lopez-Juarez,et al.  On-line incremental learning for unknown conditions during assembly operations with industrial robots , 2015, Evol. Syst..

[74]  Roberto Souto Maior de Barros,et al.  A comparative study on concept drift detectors , 2014, Expert Syst. Appl..

[75]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[76]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[77]  Marcus A. Maloof,et al.  Paired Learners for Concept Drift , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[78]  Edwin Lughofer,et al.  Evolving Fuzzy Systems - Methodologies, Advanced Concepts and Applications , 2011, Studies in Fuzziness and Soft Computing.

[79]  Jiye Liang,et al.  A Framework for Clustering Categorical Time-Evolving Data , 2010, IEEE Transactions on Fuzzy Systems.

[80]  Bartosz Krawczyk,et al.  One-class classifiers with incremental learning and forgetting for data streams with concept drift , 2015, Soft Comput..

[81]  Xin Yao,et al.  Online Class Imbalance Learning and its Applications in Fault Detection , 2013, Int. J. Comput. Intell. Appl..

[82]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[83]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[84]  Bartosz Krawczyk,et al.  Combined classifier based on feature space partitioning , 2012, Int. J. Appl. Math. Comput. Sci..

[85]  Hamid Beigy,et al.  New Drift Detection Method for Data Streams , 2011, ICAIS.

[86]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[87]  Plamen P. Angelov,et al.  Evolving fuzzy systems for data streams: a survey , 2011, WIREs Data Mining Knowl. Discov..