Classifying under computational resource constraints: anytime classification using probabilistic estimators

Abstract In many online applications of machine learning, the computational resources available for classification will vary from time to time. Most techniques are designed to operate within the constraints of the minimum expected resources and fail to utilize further resources when they are available. We propose a novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy. The idea is to run an ordered sequence of very efficient Bayesian probabilistic estimators (single improvement steps) until classification time runs out. Theoretical studies and empirical validations reveal that by properly identifying, ordering, invoking and ensembling single improvement steps, AAPE is able to accomplish accurate classification whenever it is interrupted. It is also able to output class probability estimates beyond simple 0/1-loss classifications, as well as adeptly handle incremental learning.

[1]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[2]  Eamonn J. Keogh,et al.  Learning the Structure of Augmented Bayesian Classifiers , 2002, Int. J. Artif. Intell. Tools.

[3]  Bernard Toursel,et al.  Distributed Data Mining , 2001, Scalable Comput. Pract. Exp..

[4]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[5]  H. Akaike A new look at the statistical model identification , 1974 .

[6]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[7]  Shlomo Zilberstein,et al.  Attribute measurement policies for time and cost sensitive classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[8]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[9]  David W. Opitz,et al.  An anytime approach to connectionist theory refinement - refining the topologies of knowledge-based neural networks , 1996, Technical Report / University of Wisconsin, Madison / Computer Sciences Department.

[10]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[11]  Eric Horvitz,et al.  Reasoning under Varying and Uncertain Resource Constraints , 1988, AAAI.

[12]  Dennis DeCoste,et al.  Anytime Interval-Valued Outputs for Kernel Machines: Fast Support Vector Machine Classification via Distance Geometry , 2002, ICML.

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[16]  Geoffrey I. Webb,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[17]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[18]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[19]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[20]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[21]  John J. Grefenstette,et al.  An Approach to Anytime Learning , 1992, ML.

[22]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[25]  Ian Witten,et al.  Data Mining , 2000 .

[26]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[27]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[28]  Xindong Wu,et al.  Induction By Attribute Elimination , 1999, IEEE Trans. Knowl. Data Eng..

[29]  Benjamin W. Wah,et al.  Population-Based Learning: A Method for Learning from Examples Under Resource Constraints , 1992, IEEE Trans. Knowl. Data Eng..

[30]  Joe Suzuki,et al.  Learning Bayesian Belief Networks Based on the MDL Principle : An Efficient Algorithm Using the Branch and Bound Technique , 1999 .

[31]  Shlomo Zilberstein,et al.  Anytime algorithm development tools , 1996, SGAR.

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[34]  Shlomo Zilberstein,et al.  Scheduling contract algorithms on multiple processors , 2002, AAAI/IAAI.

[35]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[36]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[37]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.