Object Representation, Sample Size, and Data Set Complexity

The complexity of a pattern recognition problem is determined by its representation. It is argued and illustrated by examples that the sampling density of a given data set and the resulting complexity of a learning problem are inherently connected. A number of criteria are constructed to judge this complexity for the chosen dissimilarity representation. Some nonlinear transformations of the original representation are also investigated to illustrate that such changes may affect the resulting complexity. If the initial sampling density is originally insufficient, this may result in a data set of a lower complexity and with a satisfactory sampling. On the other hand, if the number of samples is originally abundant, the representation may become more complex.

[1]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[2]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[3]  Tin Kam Ho,et al.  Measuring the complexity of classification problems , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[4]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[5]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[6]  Horst Bunke,et al.  On Not Making Dissimilarities Euclidean , 2004, SSPR/SPR.

[7]  N. JARDINE,et al.  A New Approach to Pattern Recognition , 1971, Nature.

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  David H. Wolpert,et al.  The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , 1994 .

[10]  R. Casey,et al.  Advances in Pattern Recognition , 1971 .

[11]  Anil K. Jain,et al.  Representation and Recognition of Handwritten Digits Using Deformable Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  R. Duin,et al.  COMPLEXITY OF DISSIMILARITY BASED PATTERN CLASSES , 1998 .

[13]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[14]  Elzbieta Pekalska,et al.  The Dissimilarity representations in pattern recognition. Concepts, theory and applications. , 2005 .

[15]  D. Hofstadter,et al.  Godel, Escher, Bach: An Eternal Golden Braid , 1979 .

[16]  M. Mitchell Waldrop,et al.  Complexity : the emerging science and the edge of order and chaos , 1992 .

[17]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[18]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[19]  Kristin P. Bennett,et al.  Combining support vector and mathematical programming methods for classification , 1999 .

[20]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[21]  Joachim M. Buhmann,et al.  Going Metric: Denoising Pairwise Data , 2002, NIPS.

[22]  Robert P. W. Duin,et al.  One-Class LP Classifiers for Dissimilarity Representations , 2002, NIPS.

[23]  J. Roodenburg,et al.  Autofluorescence characteristics of healthy oral mucosa at different anatomical sites , 2003, Lasers in surgery and medicine.

[24]  Azriel Rosenfeld,et al.  Progress in pattern recognition , 1985 .

[25]  Robert P. W. Duin,et al.  Dissimilarity representations allow for building good classifiers , 2002, Pattern Recognit. Lett..

[26]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[27]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[29]  A. G. Arkad'ev,et al.  Computers and pattern recognition , 1967 .

[30]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[31]  R. C. Williamson,et al.  Classification on proximity data with LP-machines , 1999 .

[32]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[33]  J. Urry Complexity , 2006, Interpreting Art.

[34]  Robert P. W. Duin,et al.  Classifier Conditional Posterior Probabilities , 1998, SSPR/SPR.