The Dipping Phenomenon

One typically expects classifiers to demonstrate improved performance with increasing training set sizes or at least to obtain their best performance in case one has an infinite number of training samples at ones's disposal. We demonstrate, however, that there are classification problems on which particular classifiers attain their optimum performance at a training set size which is finite. Whether or not this phenomenon, which we term dipping, can be observed depends on the choice of classifier in relation to the underlying class distributions. We give some simple examples, for a few classifiers, that illustrate how the dipping phenomenon can occur. Additionally, we speculate about what generally is needed for dipping to emerge. What is clear is that this kind of learning curve behavior does not emerge due to mere chance and that the pattern recognition practitioner ought to take note of it.

[1]  J. L. Graham Learning to generalize. , 1938 .

[2]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.

[5]  R. Duin Small sample size generalization , 1995 .

[6]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1996 .

[7]  Robert P. W. Duin,et al.  Stabilizing classifiers for very small sample sizes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[8]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[9]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[11]  Pat Langley Machine Learning as an Experimental Science , 2005, Machine Learning.

[12]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[13]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[14]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[15]  Nicole Kraemer On the Peaking Phenomenon of the Lasso in Model Selection , 2009, 0904.4416.

[16]  M. Opper,et al.  5 Statistical Mechanics of Generalization , .