Combining information theoretic kernels with generative embeddings for classification

Classical approaches to learn classifiers for structured objects (e.g., images, sequences) use generative models in a standard Bayesian framework. To exploit the state-of-the-art performance of discriminative learning, while also taking advantage of generative models of the data, generative embeddings have been recently proposed as a way of building hybrid discriminative/generative approaches. A generative embedding is a mapping, induced by a generative model (usually learned from data), from the object space into a fixed dimensional space, adequate for discriminative classifier learning. Generative embeddings have been shown to often outperform the classifiers obtained directly from the generative models upon which they are built. Using a generative embedding for classification involves two main steps: (i) defining and learning a generative model and using it to build the embedding; (ii) discriminatively learning a (maybe kernel) classifier with the embedded data. The literature on generative embeddings is essentially focused on step (i), usually taking some standard off-the-shelf tool for step (ii). Here, we adopt a different approach, by focusing also on the discriminative learning step. In particular, we exploit the probabilistic nature of generative embeddings, by using kernels defined on probability measures; in particular we investigate the use of a recent family of non-extensive information theoretic kernels on the top of different generative embeddings. We show, in different medical applications that the approach yields state-of-the-art performance.

[1]  Jean-Cédric Chappelier,et al.  PLSI: The True Fisher Kernel and beyond , 2009, ECML/PKDD.

[2]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[3]  Henrik Boström,et al.  Fusion of dimensionality reduction methods: A case study in microarray classification , 2009, 2009 12th International Conference on Information Fusion.

[4]  Colin Campbell,et al.  The latent process decomposition of cDNA microarray data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Mário A. T. Figueiredo,et al.  Hybrid generative/discriminative training of radial basis function networks , 2006, ESANN.

[6]  Nebojsa Jojic,et al.  Free energy score space , 2009, NIPS.

[7]  Robert P. W. Duin,et al.  Component-based discriminative classification for hidden Markov models , 2009, Pattern Recognit..

[8]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[9]  Alessandro Perina,et al.  Expression microarray classification using topic models , 2010, SAC '10.

[10]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[11]  Joachim M. Buhmann,et al.  Computational TMA Analysis and Cell Nucleus Classification of Renal Cell Carcinoma , 2010, DAGM-Symposium.

[12]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[13]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[14]  Manuele Bicego,et al.  Hybrid Generative-Discriminative Nucleus Classification of Renal Cell Carcinoma , 2011, SIMBAD.

[15]  Nebojsa Jojic,et al.  A hybrid generative/discriminative classification framework based on free-energy terms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[18]  M. Bellani,et al.  Thalamic‐insular dysconnectivity in schizophrenia: Evidence from structural equation modeling , 2012, Human brain mapping.

[19]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[20]  Mário A. T. Figueiredo,et al.  Similarity-based classification of sequences using hidden Markov models , 2004, Pattern Recognit..

[21]  Daniel Q. Naiman,et al.  Microarray Classification from Several Two-Gene Expression Comparisons , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[22]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Michele Tansella,et al.  Brain Morphometry by Probabilistic Latent Semantic Analysis , 2010, MICCAI.

[24]  Thomas Hofmann,et al.  Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization , 1999, NIPS.

[25]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[26]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification , 2007, ICML '07.

[27]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[28]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[29]  Michele Tansella,et al.  A New Shape Diffusion Descriptor for Brain Classification , 2011, MICCAI.

[30]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[31]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[32]  Mark J. F. Gales,et al.  Using SVMs to classify variable length speech patterns , 2002 .

[33]  Gunnar Rätsch,et al.  A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.

[34]  V. Calhoun,et al.  Voxel-based morphometry versus region of interest: a comparison of two methods for analyzing gray matter differences in schizophrenia , 2005, Schizophrenia Research.

[35]  Kenji Fukumizu,et al.  Semigroup Kernels on Measures , 2005, J. Mach. Learn. Res..

[36]  Naonori Ueda,et al.  Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[38]  Tai Sing Lee,et al.  Hybrid generative-discriminative classification using posterior divergence , 2011, CVPR 2011.

[39]  Michele Tansella,et al.  Selecting Scales by Multiple Kernel Learning for Shape Diffusion Analysis , 2011 .

[40]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[41]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[42]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).