Analyzing the Impact of Feature Drifts in Streaming Learning

Learning from data streams requires efficient algorithms capable of deriving a model accordingly to the arrival of new instances. Data streams are by definition unbounded sequences of data that are possibly non stationary, i.e. they may undergo changes in data distribution, phenomenon named concept drift. Concept drifts force streaming learning algorithms to detect and adapt to such changes in order to present feasible accuracy throughout time. Nonetheless, most of works presented in the literature do not account for a specific kind of drifts: feature drifts. Feature drifts occur whenever the relevance of an arbitrary attribute changes through time, also impacting the concept to be learned. In this paper we (i) verify the occurrence of feature drift in a publicly available dataset, (ii) present a synthetic data stream generator capable of performing feature drifts and (iii) analyze the impact of this type of drift in stream learning algorithms, enlightening that there is room and the need for dynamic feature selection strategies for data streams.

[1]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[2]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[3]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[4]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[5]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[6]  Mohamed Medhat Gaber,et al.  Knowledge discovery from data streams , 2009, IDA 2009.

[7]  Li Wan,et al.  Heterogeneous Ensemble for Feature Drifts in Data Streams , 2012, PAKDD.

[8]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[10]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[11]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[12]  William W. Cohen,et al.  Single-pass online learning: performance, voting schemes and online feature selection , 2006, KDD '06.

[13]  Jean Paul Barddal,et al.  SFNClassifier: a scale-free social network method to handle concept drift , 2014, SAC.

[14]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[15]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[16]  Zhou Zimu,et al.  RSSIからCSIへ:チャネルレスポンスによるインドア・ローカリゼーション , 2013 .

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.