Probabilistic Active Learning in Datastreams

In recent years, stream-based active learning has become an intensively investigated research topic. In this work, we propose a new algorithm for stream-based active learning that decides immediately whether to acquire a label (selective sampling). To this purpose, we extend our pool-based Probabilistic Active Learning framework into a framework for streams. In particular, we complement the notion of usefulness within a topological space (“spatial usefulness”) with the concept of “temporal usefulness”. To actively select the instances, for which labels must be acquired, we introduce the Balanced Incremental Quantile Filter (BIQF), an algorithm that assesses the usefulness of instances in a sliding window, ensuring that the predefined budget restrictions will be met within a given tolerance window. We compare our approach to other active learning approaches for streams and show the competitiveness of our method.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Myra Spiliopoulou,et al.  Clustering-Based Optimised Probabilistic Active Learning (COPAL) , 2015, Discovery Science.

[3]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[4]  Dino Ienco,et al.  Clustering Based Active Learning for Evolving Data Streams , 2013, Discovery Science.

[5]  Olivier Chapelle,et al.  Active Learning for Parzen Window Classifier , 2005, AISTATS.

[6]  Jiang Wang,et al.  Feedback-driven multiclass active learning for data streams , 2013, CIKM.

[7]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[8]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[9]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[10]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[11]  Xiaodong Lin,et al.  Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Myra Spiliopoulou,et al.  Probabilistic Active Learning: Towards Combining Versatility, Optimality and Efficiency , 2014, Discovery Science.

[13]  Claude Sammut,et al.  Extracting Hidden Context , 1998, Machine Learning.

[14]  Lu Wang,et al.  Quantiles over data streams: an experimental study , 2013, SIGMOD '13.

[15]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[16]  Brian Mac Namee,et al.  Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost , 2010, FLAIRS.

[17]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[18]  Lihong Li,et al.  Unbiased online active learning in data streams , 2011, KDD.

[19]  Joung Woo Ryu,et al.  An Efficient Method of Building an Ensemble of Classifiers in Streaming Data , 2012, BDA.

[20]  Fredrik Olsson,et al.  A Web Survey on the Use of Active Learning to Support Annotation of Text Data , 2009, HLT-NAACL 2009.

[21]  Yaroslav O. Halchenko,et al.  Open is Not Enough. Let's Take the Next Step: An Integrated, Community-Driven Computing Platform for Neuroscience , 2012, Front. Neuroinform..

[22]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[23]  Yisheng Dong,et al.  An active learning system for mining time-changing data streams , 2007, Intell. Data Anal..

[24]  Li Guo,et al.  Mining Multi-Label Data Streams Using Ensemble-Based Active Learning , 2012, SDM.