论文信息 - Harnessing the strengths of anytime algorithms for constant data streams

Harnessing the strengths of anytime algorithms for constant data streams

Anytime algorithms have been proposed for many different applications, e.g., in data mining. Their strengths are the ability to first provide a result after a very short initialization and second to improve their result with additional time. Therefore, anytime algorithms have so far been used when the available processing time varies, e.g., on varying data streams. In this paper we propose to employ anytime algorithms on constant data streams, i.e., for tasks with constant time allowance. We introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm. We derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets.

Thomas Seidl | Philipp Kranen

[1] William Cheetham,et al. Case-Based Reasoning with Confidence , 2000, EWCBR.

[2] Dah-Jye Lee,et al. Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3] Eric A. Wan,et al. Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[4] Koby Crammer,et al. Online Classification on a Budget , 2003, NIPS.

[5] Ira Assent,et al. Indexing density models for incremental learning and anytime classification on data streams , 2009, EDBT '09.

[6] William Nick Street,et al. A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[7] Koby Crammer,et al. Confidence-weighted linear classification , 2008, ICML '08.

[8] Rina Panigrahy,et al. Better streaming algorithms for clustering problems , 2003, STOC '03.

[9] Michael P. Wellman,et al. On state-space abstraction for anytime evaluation of Bayesian networks , 1996, SGAR.

[10] Graham Cormode,et al. What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[11] Philip S. Yu,et al. A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.