Polishing the Right Apple: Anytime Classification Also Benefits Data Streams with Constant Arrival Times

Classification of items taken from data streams requires algorithms that operate in time sensitive and computationally constrained environments. Often, the available time for classification is not known a priori and may change as a consequence of external circumstances. Many traditional algorithms are unable to provide satisfactory performance while supporting the highly variable response times that exemplify such applications. In such contexts, anytime algorithms, which are amenable to trading time for accuracy, have been found to be exceptionally useful and constitute an area of increasing research activity. Previous techniques for improving anytime classification have generally been concerned with optimizing the probability of correctly classifying individual objects. However, as we shall see, serially optimizing the probability of correctly classifying individual objects K times, generally gives inferior results to batch optimizing the probability of correctly classifying K objects. In this work, we show that this simple observation can be exploited to improve overall classification performance by using an anytime framework to allocate resources among a set of objects buffered from a fast arriving stream. Our ideas are independent of object arrival behavior, and, perhaps unintuitively, even in data streams with constant arrival rates our technique exhibits a marked improvement in performance. The utility of our approach is demonstrated with extensive experimental evaluations conducted on a wide range of diverse datasets.

[1]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[2]  G. Parker,et al.  Models of parent-offspring conflict. III. Intra-brood conflict , 1979, Animal Behaviour.

[3]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[4]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[5]  Shaul Markovitch,et al.  Learning to Order BDD Variables in Verification , 2011, J. Artif. Intell. Res..

[6]  Ira Assent,et al.  Indexing density models for incremental learning and anytime classification on data streams , 2009, EDBT '09.

[7]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[8]  Eamonn J. Keogh,et al.  A Compression Based Distance Measure for Texture , 2010, SDM.

[9]  Stefan J. Johansson,et al.  Where do we go now?: anytime algorithms for path planning , 2009, FDG.

[10]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[11]  Shaul Markovitch,et al.  Interruptible anytime algorithms for iterative improvement of decision trees , 2005, UBDM '05.

[12]  M. Manser,et al.  The effect of pup vocalisations on food allocation in a cooperative mammal, the meerkat (Suricata suricatta) , 2000, Behavioral Ecology and Sociobiology.

[13]  Geoffrey I. Webb,et al.  Classifying under computational resource constraints: anytime classification using probabilistic estimators , 2007, Machine Learning.

[14]  Shlomo Zilberstein,et al.  Approximate Reasoning Using Anytime Algorithms , 1995 .

[15]  John A. Byers,et al.  Temporal clumping of bark beetle arrival at pheromone traps: Modeling anemotaxis in chaotic plumes , 1996, Journal of Chemical Ecology.

[16]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[17]  Tony Lindgren Anytime inductive logic programming , 2000, Computers and Their Applications.

[18]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[19]  Dimitrios Gunopulos,et al.  Iterative Incremental Clustering of Time Series , 2004, EDBT.

[20]  Thomas Seidl,et al.  Harnessing the strengths of anytime algorithms for constant data streams , 2009, Data Mining and Knowledge Discovery.

[21]  Dah-Jye Lee,et al.  Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[22]  Shlomo Zilberstein,et al.  Anytime algorithm development tools , 1996, SGAR.

[23]  Geoffrey I. Webb,et al.  Anytime classification for a pool of instances , 2009, Machine Learning.

[24]  Shlomo Zilberstein,et al.  Monitoring anytime algorithms , 1996, SGAR.