论文信息 - Combining information theoretic kernels with generative embeddings for classification

Combining information theoretic kernels with generative embeddings for classification

Classical approaches to learn classifiers for structured objects (e.g., images, sequences) use generative models in a standard Bayesian framework. To exploit the state-of-the-art performance of discriminative learning, while also taking advantage of generative models of the data, generative embeddings have been recently proposed as a way of building hybrid discriminative/generative approaches. A generative embedding is a mapping, induced by a generative model (usually learned from data), from the object space into a fixed dimensional space, adequate for discriminative classifier learning. Generative embeddings have been shown to often outperform the classifiers obtained directly from the generative models upon which they are built. Using a generative embedding for classification involves two main steps: (i) defining and learning a generative model and using it to build the embedding; (ii) discriminatively learning a (maybe kernel) classifier with the embedded data. The literature on generative embeddings is essentially focused on step (i), usually taking some standard off-the-shelf tool for step (ii). Here, we adopt a different approach, by focusing also on the discriminative learning step. In particular, we exploit the probabilistic nature of generative embeddings, by using kernels defined on probability measures; in particular we investigate the use of a recent family of non-extensive information theoretic kernels on the top of different generative embeddings. We show, in different medical applications that the approach yields state-of-the-art performance.

[1] Jean-Cédric Chappelier,et al. PLSI: The True Fisher Kernel and beyond , 2009, ECML/PKDD.

[2] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[3] Henrik Boström,et al. Fusion of dimensionality reduction methods: A case study in microarray classification , 2009, 2009 12th International Conference on Information Fusion.

[4] Colin Campbell,et al. The latent process decomposition of cDNA microarray data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5] Mário A. T. Figueiredo,et al. Hybrid generative/discriminative training of radial basis function networks , 2006, ESANN.

[6] Nebojsa Jojic,et al. Free energy score space , 2009, NIPS.

[7] Robert P. W. Duin,et al. Component-based discriminative classification for hidden Markov models , 2009, Pattern Recognit..

[8] Andrew Zisserman,et al. Scene Classification Via pLSA , 2006, ECCV.

[9] Alessandro Perina,et al. Expression microarray classification using topic models , 2010, SAC '10.

[10] Lei Liu,et al. Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[11] Joachim M. Buhmann,et al. Computational TMA Analysis and Cell Nucleus Classification of Renal Cell Carcinoma , 2010, DAGM-Symposium.

[12] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[13] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[14] Manuele Bicego,et al. Hybrid Generative-Discriminative Nucleus Classification of Renal Cell Carcinoma , 2011, SIMBAD.

[15] Nebojsa Jojic,et al. A hybrid generative/discriminative classification framework based on free-energy terms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16] U. Alon,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17] Guillaume Bouchard,et al. The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[18] M. Bellani,et al. Thalamic‐insular dysconnectivity in schizophrenia: Evidence from structural equation modeling , 2012, Human brain mapping.

[19] Trevor J. Hastie,et al. Discriminative vs Informative Learning , 1997, KDD.

[20] Mário A. T. Figueiredo,et al. Similarity-based classification of sequences using hidden Markov models , 2004, Pattern Recognit..

[21] Daniel Q. Naiman,et al. Microarray Classification from Several Two-Gene Expression Comparisons , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[22] Tom Minka,et al. Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23] Michele Tansella,et al. Brain Morphometry by Probabilistic Latent Semantic Analysis , 2010, MICCAI.

[24] Thomas Hofmann,et al. Learning the Similarity of Documents: An Information-Geometric Approach to Document Retrieval and Categorization , 1999, NIPS.

[25] Tony Jebara,et al. Probability Product Kernels , 2004, J. Mach. Learn. Res..

[26] Li Wang,et al. Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[27] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[28] Yoshua Bengio,et al. Pattern Recognition and Neural Networks , 1995 .

[29] Michele Tansella,et al. A New Shape Diffusion Descriptor for Brain Classification , 2011, MICCAI.

[30] Eric P. Xing,et al. Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[31] Leonidas J. Guibas,et al. A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[32] Mark J. F. Gales,et al. Using SVMs to classify variable length speech patterns , 2002 .

[33] Gunnar Rätsch,et al. A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.

[34] V. Calhoun,et al. Voxel-based morphometry versus region of interest: a comparison of two methods for analyzing gray matter differences in schizophrenia , 2005, Schizophrenia Research.

[35] Kenji Fukumizu,et al. Semigroup Kernels on Measures , 2005, J. Mach. Learn. Res..

[36] Naonori Ueda,et al. Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[38] Tai Sing Lee,et al. Hybrid generative-discriminative classification using posterior divergence , 2011, CVPR 2011.

[39] Michele Tansella,et al. Selecting Scales by Multiple Kernel Learning for Shape Diffusion Analysis , 2011 .

[40] Mark J. F. Gales,et al. Speech Recognition using SVMs , 2001, NIPS.

[41] Andrew Zisserman,et al. Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[42] Cristian Sminchisescu,et al. Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).