PAC Classification based on PAC Estimates of Label Class Distributions

A standard approach in pattern classification is to estimate the distributions of the label classes, and then to use the Bayes classifier (applied to the estimates of distributions) to classify unlabelled examples. As one might expect, the better our estimates of the label class distributions, the better will be the resultant classifier. In this paper we verify this observation in the (agnostic) PAC setting, and identify precise bounds on the misclassification rate in terms of the quality of the estimates of the lebel class distributions, as measured by variation distance or KL-divergence. We show how agnostic PAC learnability relates to estimates of the distributions that have a PAC guarantee on their variation distances from the true distributions, and we express the increase in negative log likelihood risk in terms of PAC bounds on the KL-divergences.

[1]  Paul W. Goldberg,et al.  Evolutionary Trees Can be Learned in Polynomial Time in the Two-State General Markov Model , 2001, SIAM J. Comput..

[2]  N. Abe,et al.  Polynomial Learnability of Stochastic Rules with Respect to the KL-Divergence and Quadratic Distance , 2001 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  J. Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[5]  Paul W. Goldberg,et al.  Some Discriminant-Based PAC Algorithms , 2006, J. Mach. Learn. Res..

[6]  Paul W. Goldberg,et al.  PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance , 2007, Theor. Comput. Sci..

[7]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[8]  Paul W. Goldberg,et al.  Evolutionary trees can be learned in polynomial time in the two-state general Markov model , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[9]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[10]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Naoki Abe,et al.  On the computational complexity of approximating distributions by probabilistic automata , 1990, Machine Learning.

[13]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[14]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[16]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[17]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[18]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[19]  Alexander Clark,et al.  PAC-learnability of Probabilistic Deterministic Finite State Automata , 2004, J. Mach. Learn. Res..

[20]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[21]  Golan Yona,et al.  Variations on probabilistic suffix trees: statistical modeling and prediction of protein families , 2001, Bioinform..