Hedging predictions in machine learning

Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for ‘hedging’ the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects’ features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning.

[1]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[2]  Vladimir Vovk,et al.  Criterion of calibration for transductive confidence machine with limited feedback , 2006, Theor. Comput. Sci..

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Vladimir Vovk,et al.  Predictions as Statements and Decisions , 2006, COLT.

[6]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[7]  Harris Papadopoulos,et al.  Qualified Prediction for Large Data Sets in the Case of Pattern Recognition , 2002, International Conference on Machine Learning and Applications.

[8]  J. Sutherland The Quark and the Jaguar , 1994 .

[9]  B. Clarke Discussion of the Papers by Rissanen, and by Wallace and Dowe , 1999, Comput. J..

[10]  H. Hornich Logik der Forschung , 1936 .

[11]  G. Shafer The Unity and Diversity of Probability , 1990 .

[12]  J. Mill A System of Logic , 1843 .

[13]  M. Kendall Theoretical Statistics , 1956, Nature.

[14]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[15]  A. Gammerman,et al.  Bayesian diagnostic probabilities without assuming independence of symptoms. , 1991, Methods of information in medicine.

[16]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[17]  David L. Dowe,et al.  General Bayesian networks and asymmetric languages , 2003 .

[18]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[19]  A. Zeilinger,et al.  Speakable and Unspeakable in Quantum Mechanics , 1989 .

[20]  Vladimir Vovk,et al.  Ridge Regression Confidence Machine , 2001, International Conference on Machine Learning.

[21]  Vladimir Vovk,et al.  On-line confidence machines are well-calibrated , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[22]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Alexander Gammerman,et al.  Qualified predictions for microarray and proteomics pattern diagnostics with confidence machines , 2005, Int. J. Neural Syst..

[25]  V. Vovk Competitive On‐line Statistics , 2001 .

[26]  A. J. Gammerman,et al.  Plant promoter prediction with confidence estimation , 2005, Nucleic acids research.

[27]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[28]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[29]  M. Kohler Wallace CS: Statistical and inductive inference by minimum message length , 2006 .

[30]  David L. Dowe,et al.  MML Inference of Oblique Decision Trees , 2004, Australian Conference on Artificial Intelligence.

[31]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[32]  Vladimir Vovk,et al.  Comparing the Bayes and Typicalness Frameworks , 2001, ECML.

[33]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[34]  Juyang Weng,et al.  Muddy Tasks and the Necessity of Autonomous Mental Development , 2005 .

[35]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[36]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[37]  Hans-Martin Gutmann,et al.  A Radial Basis Function Method for Global Optimization , 2001, J. Glob. Optim..

[38]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[39]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[40]  Xiaohui Liu,et al.  Robust Selection of Predictive Genes via a Simple Classifier , 2006, Applied bioinformatics.

[41]  Péter Gács,et al.  Uniform test of algorithmic randomness over a general space , 2003, Theor. Comput. Sci..

[42]  Kevin B. Korb,et al.  Calibration and the Evaluation of Predictive Learners , 1999, International Joint Conference on Artificial Intelligence.

[43]  Harris Papadopoulos,et al.  Inductive Confidence Machines for Regression , 2002, ECML.

[44]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[45]  David L. Dowe,et al.  Minimum message length and generalized Bayesian nets with asymmetric languages , 2005 .

[46]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[47]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[48]  A. Dempster An overview of multivariate data analysis , 1971 .

[49]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.