A Universal Well-Calibrated Algorithm for On-line Classification

We study the problem of on-line classification in which the prediction algorithm is given a confidence level 1 - δ and is required to output as its prediction a range of labels (intuitively, those labels deemed compatible with the available data at the level δ) rather than just one label; as usual, the examples are assumed to be generated independently from the same probability distribution P. The prediction algorithm is said to be well-calibrated for P and δ if the long-run relative frequency of errors does not exceed δ almost surely w.r. to P. For well-calibrated algorithms we take the number of uncertain predictions (i.e., those containing more than one label) as the principal measure of predictive performance. The main result of this paper is the construction of a prediction algorithm which, for any (unknown) P and any δ: (a) makes errors independently and with probability 6 at every trial (in particular, is well-calibrated for P and δ); (b) makes in the long run no more uncertain predictions than any other prediction algorithm that is well-calibrated for P and δ; (c) processes example n in time O(log n).

[1]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[2]  Vladimir Vovk,et al.  On-line confidence machines are well-calibrated , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[3]  Vladimir Vovk,et al.  Well-calibrated predictions from on-line compression models , 2006, Theor. Comput. Sci..

[4]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[5]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[6]  Ronald L. Rivest,et al.  Learning complicated concepts reliably and usefully , 1988, Annual Conference Computational Learning Theory.

[7]  Vladimir Vovk,et al.  Online Region Prediction with Real Teachers , 2003 .

[8]  Vladimir Vovk,et al.  Asymptotic Optimality of Transductive Confidence Machine , 2002, ALT.

[9]  M. Kendall Theoretical Statistics , 1956, Nature.

[10]  Alexander Gammerman,et al.  Testing Exchangeability On-Line , 2003, ICML.

[11]  G. Lugosi,et al.  On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[12]  Vladimir Vovk,et al.  Criterion of calibration for transductive confidence machine with limited feedback , 2006, Theor. Comput. Sci..

[13]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[14]  Alexander Gammerman,et al.  Transduction with Confidence and Credibility , 1999, IJCAI.

[15]  A. Shiryaev,et al.  Probability (2nd ed.) , 1995, Technometrics.

[16]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .