论文信息 - Concept learning using complexity regularization

Concept learning using complexity regularization

In pattern recognition or, as it has also been called, concept learning, the value of a { 0,1}-valued random variable Y is to be predicted based upon observing an R/sup d/-valued random variable X. We apply the method of complexity regularization to learn concepts from large concept classes. The method is shown to automatically find a good balance between the approximation error and the estimation error. In particular, the error probability of the obtained classifier is shown to decrease as O(/spl radic/(logn/n)) to the achievable optimum, for large nonparametric classes of distributions, as the sample size n grows. We also show that if the Bayes error probability is zero and the Bayes rule is in a known family of decision rules, the error probability is O(logn/n) for many large families, possibly with infinite VC dimension.

Gábor Lugosi | Kenneth Zeger | G. Lugosi | K. Zeger

[1] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2] H. Akaike. A new look at the statistical model identification , 1974 .

[3] L. Devroye. Nonparametric Discrimination and Density Estimation. , 1976 .

[4] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[5] L. Devroye. Bounds for the Uniform Deviation of Empirical Measures , 1982 .

[6] Luc Devroye,et al. Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[8] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[9] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[10] David Haussler,et al. Predicting (0, 1)-functions on randomly drawn points , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[11] David Haussler,et al. Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[12] Luc Devroye,et al. Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[13] Vladimir Vapnik,et al. Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[14] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[16] Andrew R. Barron,et al. Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[17] Andrew R. Barron,et al. Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[18] G. Lugosi,et al. Strong Universal Consistency of Neural Network Classifiers , 1993, Proceedings. IEEE International Symposium on Information Theory.

[19] András Faragó,et al. Strong universal consistency of neural network classifiers , 1993, IEEE Trans. Inf. Theory.

[20] Alon Itai,et al. Nonuniform Learnability , 1988, J. Comput. Syst. Sci..

[21] Luc Devroye,et al. Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..

[22] G. Lugosi. Improved upper bounds for probabilities of uniform deviations , 1995 .

[23] P. R. Kumar,et al. Learning by canonical smooth estimation. II. Learning and choice of model complexity , 1996, IEEE Trans. Autom. Control..