Measuring the complexity of classification problems

We study a number of measures that characterize the difficulty of a classification problem. We compare a set of real world problems to random combinations of points in this measurement space and found that real problems contain structures that are significantly different from the random sets. Distribution of problems in this space reveals that there exist at least two independent factors affecting a problem's difficulty, and that they have notable joint effects. We suggest using this space to describe a classifier domain of competence. This can guide static and dynamic selection of classifiers for specific problems as well as sub-problems formed by confinement, projections, and transformations of the feature vectors.

[1]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[2]  E. Kleinberg An overtraining-resistant stochastic modeling method for pattern recognition , 1996 .

[3]  FRED W. SMITH,et al.  Pattern Classifier Design by Linear Programming , 1968, IEEE Transactions on Computers.

[4]  Tin Kam Ho,et al.  The learning behavior of single neuron classifiers on linearly separable or nonseparable input , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[5]  Robert P. W. Duin,et al.  An Evaluation of Intrinsic Dimensionality Estimators , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  R. Berlind An alternative method of stochastic discrimination with applications to pattern recognition , 1995 .

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  A. K. Jain,et al.  A critical evaluation of intrinsic dimensionality algorithms. , 1980 .

[9]  So Young Sohn,et al.  Meta Analysis of Classification Algorithms for Pattern Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Tin Kam Ho,et al.  Large-Scale Simulation Studies in Image Pattern Recognition , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[12]  Robert P. W. Duin,et al.  On the nonlinearity of pattern classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[13]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Anil K. Jain,et al.  A Test to Determine the Multivariate Normality of a Data Set , 1988, IEEE Trans. Pattern Anal. Mach. Intell..