The typicalness framework: a comparison with the Bayesian approach

When correct priors are known, Bayesian algorithms give optimal decisions, and accurate confidence values for predictions can be obtained. If the prior is incorrect however, these confidence values have no theoretical base – even though the algorithms’ predictive performance may be good. There also exist many successful learning algorithms which only depend on the iid assumption. Often however they produce no confidence values for their predictions. Bayesian frameworks are often applied to these algorithms in order to obtain such values, however they can rely on unjustified priors. In this paper we outline the typicalness framework which can be used in conjunction with many other machine learning algorithms. The framework provides confidence information based only on the standard iid assumption and so is much more robust to different underlying data distributions. We show how the framework can be applied to existing algorithms. We also present experimental results which show that the typicalness approach performs close to Bayes when the prior is known to be correct. Unlike Bayes however, the method still gives accurate confidence values even when different data distributions are considered.

[1]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[2]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[3]  Alexander Gammerman,et al.  Pattern Recognition and Density Estimation under the General i.i.d. Assumption , 2001, COLT/EuroCOLT.

[4]  David Surkov,et al.  Inductive confidence machine for pattern recognition , 2004 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[7]  Craig John Saunders Efficient implementation and experimental testing of transductive algorithms for predicting with confidence , 2000 .

[8]  Thore Graepel,et al.  A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[9]  Michel Maignan,et al.  Confidence Evaluation for Risk Prediction , 2001 .

[10]  Klaus Obermayer,et al.  Bayesian Transduction , 1999, NIPS.

[11]  Vladimir Vovk,et al.  Ridge Regression Confidence Machine , 2001, International Conference on Machine Learning.

[12]  Alexander Gammerman,et al.  Transductive Confidence Machines for Pattern Recognition , 2002, ECML.

[13]  Vladimir Vovk,et al.  Comparing the Bayes and Typicalness Frameworks , 2001, ECML.

[14]  R. Herbrich Bayesian Learning in Reproducing Kernel Hilbert Spaces , 1999 .

[15]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[16]  Ilia Nouretdinov,et al.  Algorithmic theory of randomness and its applications , 2003 .

[17]  Alexander Gammerman,et al.  Computationally Efficient Transductive Machines , 2000, ALT.

[18]  Alexander Gammerman,et al.  Learning by Transduction , 1998, UAI.

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Alexander Gammerman,et al.  Transduction with Confidence and Credibility , 1999, IJCAI.

[21]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[22]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.