Hedging Predictions in Machine Learning: The Second Computer Journal Lecture

Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for 'hedging' the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects' features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning.

[1]  David L. Dowe,et al.  General Bayesian networks and asymmetric languages , 2003 .

[2]  M. Kendall Theoretical Statistics , 1956, Nature.

[3]  J. Mill A System of Logic , 1843 .

[4]  A. Gammerman,et al.  Bayesian diagnostic probabilities without assuming independence of symptoms. , 1991, Methods of information in medicine.

[5]  Vladimir Vovk,et al.  On-line confidence machines are well-calibrated , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[6]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[7]  Vladimir Vovk,et al.  Predictions as Statements and Decisions , 2006, COLT.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Alexander Gammerman,et al.  Qualified predictions for microarray and proteomics pattern diagnostics with confidence machines , 2005, Int. J. Neural Syst..

[10]  Harris Papadopoulos,et al.  Qualified Prediction for Large Data Sets in the Case of Pattern Recognition , 2002, International Conference on Machine Learning and Applications.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[13]  Vladimir Vovk,et al.  Comparing the Bayes and Typicalness Frameworks , 2001, ECML.

[14]  Céline Rouveirol,et al.  Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[15]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[16]  V. Vovk Competitive On‐line Statistics , 2001 .

[17]  A. Dempster An overview of multivariate data analysis , 1971 .

[18]  Péter Gács,et al.  Uniform test of algorithmic randomness over a general space , 2003, Theor. Comput. Sci..

[19]  G. Shafer The Unity and Diversity of Probability , 1990 .

[20]  A. J. Gammerman,et al.  Plant promoter prediction with confidence estimation , 2005, Nucleic acids research.

[21]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[22]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[23]  A. Philip Dawid,et al.  Discussion of the Papers by Rissanen and by Wallace and Dowe , 1999, Comput. J..

[24]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[25]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[26]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[27]  Vladimir Vovk,et al.  Criterion of calibration for transductive confidence machine with limited feedback , 2006, Theor. Comput. Sci..

[28]  Hans-Martin Gutmann,et al.  A Radial Basis Function Method for Global Optimization , 2001, J. Glob. Optim..

[29]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[30]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[31]  K. Popper,et al.  Logik der Forschung , 1935 .

[32]  J. Sutherland The Quark and the Jaguar , 1994 .

[33]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[34]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[35]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[36]  Xiaohui Liu,et al.  Robust Selection of Predictive Genes via a Simple Classifier , 2006, Applied bioinformatics.

[37]  J. Bell,et al.  Speakable and Unspeakable in Quatum Mechanics , 1988 .

[38]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[39]  David L. Dowe,et al.  MML Inference of Oblique Decision Trees , 2004, Australian Conference on Artificial Intelligence.

[40]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[42]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[43]  Juyang Weng,et al.  Muddy Tasks and the Necessity of Autonomous Mental Development , 2005 .

[44]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[45]  David L. Dowe,et al.  Minimum message length and generalized Bayesian nets with asymmetric languages , 2005 .

[46]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[47]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[48]  Harris Papadopoulos,et al.  Inductive Confidence Machines for Regression , 2002, ECML.

[49]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[50]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[51]  A. Zeilinger,et al.  Speakable and Unspeakable in Quantum Mechanics , 1989 .

[52]  Vladimir Vovk,et al.  Ridge Regression Confidence Machine , 2001, International Conference on Machine Learning.

[53]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[54]  Kevin B. Korb,et al.  Calibration and the Evaluation of Predictive Learners , 1999, International Joint Conference on Artificial Intelligence.

[55]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .