From Theoretical Learnability to Statistical Measures of the Learnable

The main focus of theoretical models for machine learning is to formally describe what is the meaning of learnable, what is a learning process, or what is the relationship between a learning agent and a teaching one. However, when we prove from a theoretical point of view that a concept is learnable, we have no a priori idea concerning the difficulty to learn the target concept. In this paper, after reminding some theoretical concepts and the main estimation methods, we provide a learning-system independent measure of the difficulty to learn a concept. It is based on geometrical and statistical concepts, and the implicit assumption that distinct classes occupy distinct regions in the feature space. In such a context, we assume the learnability to be identify by the separability level in the feature space. Our definition is constructive, based on a statistical test and has been implemented on problems of the UCI repository. The results are really convincing and fit well with theoretical results and intuition. Finally, in order to reduce the computational costs of our approach, we propose a new way to characterize the geometrical regions using a k-Nearest-Neighbors graph. We experimentally show that it allows to compute accuracy estimates near from those obtained by a leave-one-out-cross-validation and with smaller standard deviation.

[1]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[2]  José L. Balcázar,et al.  Structural Complexity I , 1995, Texts in Theoretical Computer Science An EATCS Series.

[3]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[4]  Marc Sebban Modèles théoriques en reconnaissance de formes et architecture hydride pour machine perceptive , 1996 .

[5]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[6]  Tao Jiang,et al.  Lower Bounds on Learning Decision Lists and Trees , 1995, Inf. Comput..

[7]  José L. Balcázar,et al.  Structural complexity 1 , 1988 .

[8]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[10]  Ron Kohavi Feature Subset Selection as Search with Probabilistic Estimates , 1994 .

[11]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[12]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[13]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[14]  Michael Frazier,et al.  Learning from a consistently ignorant teacher , 1994, COLT '94.

[15]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Sally A. Goldman,et al.  Teaching a Smarter Learner , 1996, J. Comput. Syst. Sci..

[18]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[19]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .