Generative models for similarity-based classification

A maximum-entropy approach to generative similarity-based classifiers model is proposed. First, a descriptive set of similarity statistics is assumed to be sufficient for classification. Then the class-conditional distributions of these descriptive statistics are estimated as the maximum-entropy distributions subject to empirical moment constraints. The resulting exponential class-conditional distributions are used in a maximum a posteriori decision rule, forming the similarity discriminant analysis (SDA) classifier. Simulated and real data experiments compare performance to the k-nearest neighbor classifier, the nearest-centroid classifier, and the potential support vector machine (PSVM).

[1]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[2]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[3]  J. W. Hutchinson,et al.  Nearest neighbor analysis of psychological spaces. , 1986 .

[4]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[5]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[6]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[7]  Robert P. W. Duin,et al.  Relational discriminant analysis , 1999, Pattern Recognit. Lett..

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[10]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[11]  P. Macdonald,et al.  Interpreting Multivariate Data , 1982 .

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  Maya R. Gupta,et al.  Adaptive Local Linear Regression With Application to Printer Color Management , 2008, IEEE Transactions on Image Processing.

[14]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[15]  Amos Tversky,et al.  On the reciprocity of proximity relations , 1980 .

[16]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[17]  N. JARDINE,et al.  A New Approach to Pattern Recognition , 1971, Nature.

[18]  S. Salzberg,et al.  A weighted nearest neighbor algorithm for learning with symbolic features , 2004, Machine Learning.

[19]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Klaus Obermayer,et al.  Support Vector Machines for Dyadic Data , 2006, Neural Computation.

[21]  Klaus Obermayer,et al.  Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems , 2002, NIPS.

[22]  Martin Vingron,et al.  Gaussian mixture density estimation applied to microarray data , 2003 .

[23]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[24]  Maya R. Gupta,et al.  Information-theoretic and Set-theoretic Similarity , 2006, 2006 IEEE International Symposium on Information Theory.

[25]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[26]  A. Tversky Features of Similarity , 1977 .

[27]  Klaus Obermayer,et al.  An SMO Algorithm for the Potential Support Vector Machine , 2008, Neural Computation.

[28]  L. Cazzanti,et al.  Quality Assessment of Low Free-Energy Protein Structure Predictions , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[29]  Andrea Torsello,et al.  Special issue on Similarity-based pattern recognition , 2006 .

[30]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[31]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[32]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[33]  L. Goddard Information Theory , 1962, Nature.

[34]  Yoshihiko Hamamoto,et al.  A local mean-based nonparametric classifier , 2006, Pattern Recognit. Lett..

[35]  Richard Nock,et al.  A Simple Locally Adaptive Nearest Neighbor Rule With Application To Pollution Forecasting , 2003, Int. J. Pattern Recognit. Artif. Intell..

[36]  Andrea Torsello,et al.  Similarity-Based Pattern Recognition , 2006, Lecture Notes in Computer Science.

[37]  E. Rosch,et al.  Cognition and Categorization , 1980 .

[38]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[39]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[40]  Robert M. Gray,et al.  Gaussian mixture model classifiers for small objects in images , 2005, IEEE International Conference on Image Processing 2005.

[41]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[42]  Wai Lam,et al.  Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[44]  Maya R. Gupta,et al.  Color Management of Printers by Regression over Enclosing Neighborhoods , 2007, 2007 IEEE International Conference on Image Processing.

[45]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[46]  David G. Stork,et al.  Pattern Classification , 1973 .

[47]  Simone Santini,et al.  Similarity is a Geometer , 1997, Multimedia Tools and Applications.

[48]  S. Geer,et al.  Regularization in statistics , 2006 .

[49]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[50]  Godfried Toussaint,et al.  Geometric Decision Rules for High Dimensions , .

[51]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[52]  L. Atlas,et al.  Perceptual Feature Identification for Active Sonar Echoes , 2006, OCEANS 2006.

[53]  William Stafford Noble,et al.  Support vector machine , 2013 .

[54]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[56]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[57]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[58]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[59]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[60]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[61]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[62]  Daphna Weinshall,et al.  Classification in Non-Metric Spaces , 1998, NIPS.

[63]  Joachim M. Buhmann,et al.  A maximum entropy approach to pairwise data clustering , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[64]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[65]  Yoshihiko Hamamoto,et al.  Classifier design based on the use of nearest neighbor samples , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[66]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[67]  Joachim M. Buhmann,et al.  On the information and representation of non-Euclidean pairwise data , 2006, Pattern Recognit..

[68]  Kwan Lee The Analysis of Proximity Data , 1999, Technometrics.

[69]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[70]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  A. Tversky,et al.  Weighting common and distinctive features in perceptual and conceptual judgments , 1984, Cognitive Psychology.

[72]  Amos Tversky,et al.  On the relation between common and distinctive feature models , 1987 .

[73]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[74]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[75]  Amos Tversky,et al.  Studies of similarity , 1978 .

[76]  Daphna Weinshall,et al.  Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  A. D. Gordon,et al.  Interpreting multivariate data , 1982 .

[78]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[79]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[80]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[81]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.