论文信息 - Change with Delayed Labeling: When is it Detectable?

Change with Delayed Labeling: When is it Detectable?

Handling changes over time in supervised learning (concept drift) lately has received a great deal of attention, a number of adaptive learning strategies have been developed. Most of them make an optimistic assumption that the new labels become available immediately. In real sequential classification tasks it is often unrealistic due to task specific delayed labeling or associated labeling costs. We address the problem of change detectability, given, that the new labels are not available. In this analytical study we look at the space of changes from probabilistic perspective to analyze, what changes are detectable without seeing the labels and what are not. We conduct a range of experiments with real life data with simulated and natural changes to explore this detectability issue. We propose a computationally friendly detection technique, which monitors a stream of classifier outputs. We demonstrate analytically and experimentally, what types of changes are possible to detect when the labels for the new data are not available.

Indre liobaite | Indre liobaite

[1] Shai Ben-David,et al. Detecting Change in Data Streams , 2004, VLDB.

[2] João Gama,et al. Learning with Drift Detection , 2004, SBIA.

[3] Ralf Klinkenberg,et al. Using Labeled and Unlabeled Data to Learn Drifting Concepts , 2007 .

[4] Ricard Gavaldà,et al. Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[5] Xiaodong Lin,et al. Active Learning from Data Streams , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[6] Suresh Venkatasubramanian,et al. Change (Detection) You Can Believe in: Finding Distributional Shifts in Data Streams , 2009, IDA.

[7] H. Hotelling. The Generalization of Student’s Ratio , 1931 .

[8] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[9] Sanjay Ranka,et al. Statistical change detection for multi-dimensional data , 2007, KDD '07.

[10] Philip S. Yu,et al. Classification of changes in evolving data streams using online clustering result deviation , 2006 .

[11] Nitesh V. Chawla,et al. Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams , 2009, PAKDD Workshops.