Harnessing the strengths of anytime algorithms for constant data streams

Anytime algorithms have been proposed for many different applications, e.g., in data mining. Their strengths are the ability to first provide a result after a very short initialization and second to improve their result with additional time. Therefore, anytime algorithms have so far been used when the available processing time varies, e.g., on varying data streams. In this paper we propose to employ anytime algorithms on constant data streams, i.e., for tasks with constant time allowance. We introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm. We derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets.

[1]  William Cheetham,et al.  Case-Based Reasoning with Confidence , 2000, EWCBR.

[2]  Dah-Jye Lee,et al.  Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Eric A. Wan,et al.  Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[4]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[5]  Ira Assent,et al.  Indexing density models for incremental learning and anytime classification on data streams , 2009, EDBT '09.

[6]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[7]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[8]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[9]  Michael P. Wellman,et al.  On state-space abstraction for anytime evaluation of Bayesian networks , 1996, SGAR.

[10]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[11]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[12]  Thomas Seidl,et al.  Harnessing the Strengths of Anytime Algorithms for Constant Data Streams , 2009, ECML/PKDD.

[13]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[14]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[15]  Dennis DeCoste,et al.  Anytime Query-Tuned Kernel Machines via Cholesky Factorization , 2003, SDM.

[16]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[17]  Dennis DeCoste,et al.  Anytime Interval-Valued Outputs for Kernel Machines: Fast Support Vector Machine Classification via Distance Geometry , 2002, ICML.

[18]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[19]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[20]  Marilyn A. Walker,et al.  A Boosting Approach to Topic Spotting on Subdialogues , 2000, ICML.

[21]  Dimitrios Gunopulos,et al.  Anytime Measures for Top-k Algorithms , 2007, VLDB.

[22]  Kamesh Munagala,et al.  Suppression and failures in sensor networks: a Bayesian approach , 2007, VLDB 2007.

[23]  Shaul Markovitch,et al.  Anytime Induction of Decision Trees: An Iterative Improvement Approach , 2006, AAAI.

[24]  Geoffrey I. Webb,et al.  Classifying under computational resource constraints: anytime classification using probabilistic estimators , 2007, Machine Learning.

[25]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[26]  Padraig Cunningham,et al.  Generating Estimates of Classification Confidence for a Case-Based Spam Filter , 2005, ICCBR.

[27]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[28]  Shlomo Zilberstein,et al.  Anytime algorithm development tools , 1996, SGAR.