论文信息 - Metric Entropy and Minimax Risk in Classification

Metric Entropy and Minimax Risk in Classification

We apply recent results on the minimax risk in density estimation to the related problem of pattern classification. The notion of loss we seek to minimize is an information theoretic measure of how well we can predict the classification of future examples, given the classification of previously seen examples. We give an asymptotic characterization of the minimax risk in terms of the metric entropy properties of the class of distributions that might be generating the examples. We then use these results to characterize the minimax risk in the special case of noisy two-valued classification problems in terms of the Assouad density and the Vapnik-Chervonenkis dimension.

David Haussler | Manfred Opper | D. Haussler | M. Opper

[1] A. Kolmogorov,et al. Entropy and "-capacity of sets in func-tional spaces , 1961 .

[2] G. Clements. Entropies of several sets of real valued functions , 1963 .

[3] B. Clarke. Asymptotic cumulative risk and Bayes risk under entropy loss, with applications , 1989 .

[4] P. Gänssler. Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[5] R. Dudley. A course on empirical processes , 1984 .

[6] Andrew R. Barron,et al. Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[7] E. Giné,et al. Some Limit Theorems for Empirical Processes , 1984 .

[8] David Haussler,et al. HOW WELL DO BAYES METHODS WORK FOR ON-LINE PREDICTION OF {+- 1} VALUES? , 1992 .

[9] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[10] Opper,et al. Bounds for predictive errors in the statistical mechanics of supervised learning. , 1995, Physical review letters.

[11] David B. Dunson,et al. Bayesian Data Analysis , 2010 .

[12] A. Izenman. Recent Developments in Nonparametric Density Estimation , 1991 .

[13] L. Devroye,et al. Nonparametric density estimation : the L[1] view , 1987 .

[14] David Haussler,et al. Occam's Razor , 1987, Inf. Process. Lett..

[15] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[16] D. Haussler,et al. MUTUAL INFORMATION, METRIC ENTROPY, AND RISK IN ESTIMATION OF PROBABILITY DISTRIBUTIONS , 1996 .

[17] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[18] D. Pollard. Empirical Processes: Theory and Applications , 1990 .

[19] A. Barron. Are Bayes Rules Consistent in Information , 1987 .

[20] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[21] P. Assouad. Densité et dimension , 1983 .

[22] Thomas M. Cover,et al. Open Problems in Communication and Computation , 2011, Springer New York.

[23] P. Massart,et al. Rates of convergence for minimum contrast estimators , 1993 .

[24] I. Ibragimov,et al. On density estimation in the view of Kolmogorov's ideas in approximation theory , 1990 .

[25] W. Wong,et al. Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .

[26] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[27] A. Barron,et al. Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[28] Lucien Birgé. Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[29] L. L. Cam,et al. Asymptotic Methods In Statistical Decision Theory , 1986 .

[30] David Haussler,et al. General bounds on the mutual information between a parameter and n conditionally independent observations , 1995, COLT '95.

[31] G. Lorentz. Approximation of Functions , 1966 .

[32] Bin Yu,et al. Lower Bounds on Expected Redundancy for Nonparametric Classes , 1996, IEEE Trans. Inf. Theory.

[33] L. Birge,et al. On estimating a density using Hellinger distance and some other strange facts , 1986 .

[34] D. Haussler,et al. Information Bounds for the Risk of Bayesian Predictions and the Redundancy of Universal Codes , 1993, Proceedings. IEEE International Symposium on Information Theory.

[35] Leslie G. Valiant,et al. A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[36] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[37] S. Geer. Hellinger-Consistency of Certain Nonparametric Maximum Likelihood Estimators , 1993 .

[38] Yishay Mansour,et al. Optimal universal learning and prediction of probabilistic concepts , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[39] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[40] J. Rissanen. Stochastic Complexity and Modeling , 1986 .

[41] Jorma Rissanen,et al. Density estimation by stochastic complexity , 1992, IEEE Trans. Inf. Theory.

[42] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[43] Luc Devroye,et al. Nonparametric Density Estimation , 1985 .