IBLStreams: a system for instance-based classification and regression on data streams

In order to be useful and effectively applicable in dynamically evolving environments, machine learning methods have to meet several requirements, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions. This paper advocates an instance-based learning algorithm for that purpose, both for classification and regression problems. This algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.

[1]  Martín Abadi,et al.  Security analysis of cryptographically controlled access to XML documents , 2005, PODS '05.

[2]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .

[3]  Edwin Lughofer,et al.  Evolving Fuzzy Systems - Methodologies, Advanced Concepts and Applications , 2011, Studies in Fuzziness and Soft Computing.

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[6]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[7]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[8]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[9]  David W. Aha,et al.  Lazy Learning , 1997, Springer Netherlands.

[10]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[11]  João Gama,et al.  Learning decision trees from dynamic data streams , 2005, SAC '05.

[12]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[13]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[14]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[15]  João Gama,et al.  Learning from Data Streams , 2009, Encyclopedia of Data Warehousing and Mining.

[16]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[17]  Plamen Angelov,et al.  Evolving Intelligent Systems: Methodology and Applications , 2010 .

[18]  E. Lughofer,et al.  Evolving fuzzy classifiers using different model architectures , 2008, Fuzzy Sets Syst..

[19]  Edwin Lughofer,et al.  FLEXFIS: A Robust Incremental Learning Approach for Evolving Takagi–Sugeno Fuzzy Models , 2008, IEEE Transactions on Fuzzy Systems.

[20]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[21]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[22]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[23]  Pedro M. Domingos Rule Induction and Instance-Based Learning: A Unified Approach , 1995, IJCAI.

[24]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[25]  Pedro M. Domingos,et al.  Unifying Instance-Based and Rule-Based Induction , 1996 .

[26]  Eyke Hüllermeier,et al.  Efficient instance-based learning on data streams , 2007, Intell. Data Anal..

[27]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[28]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[29]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[30]  Ramakrishnan Srikant,et al.  Kdd-2001: Proceedings of the Seventh Acm Sigkdd International Conference on Knowledge Discovery and Data Mining : August 26-29, 2001 San Francisco, Ca, USA , 2002 .

[31]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  Steven Salzberg,et al.  A Nearest Hyperrectangle Learning Method , 1991, Machine Learning.