Deep kernel dimensionality reduction for scalable data integration

Dimensionality reduction is used to preserve significant properties of data in a low-dimensional space. In particular, data representation in a lower dimension is needed in applications, where information comes from multiple high dimensional sources. Data integration, however, is a challenge in itself.In this contribution, we consider a general framework to perform dimensionality reduction taking into account that data are heterogeneous. We propose a novel approach, called Deep Kernel Dimensionality Reduction which is designed for learning layers of new compact data representations simultaneously. The method can be also used to learn shared representations between modalities. We show by experiments on standard and on real large-scale biomedical data sets that the proposed method embeds data in a new compact meaningful representation, and leads to a lower classification error compared to the state-of-the-art methods.

[1]  Martin Dugas,et al.  Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data , 2010, BMC Bioinformatics.

[2]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[3]  P. Bork,et al.  Richness of human gut microbiome correlates with metabolic markers , 2013, Nature.

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[7]  Michael I. Jordan,et al.  Unsupervised Kernel Dimension Reduction , 2010, NIPS.

[8]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[9]  Alison Pease,et al.  A Formal Cognitive Model of Mathematical Metaphors , 2009, KI.

[10]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[11]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[12]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[13]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[14]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[15]  Philippe Leray,et al.  A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies , 2011, BMC Bioinformatics.

[16]  Michael I. Jordan,et al.  Kernel Dimensionality Reduction for Supervised Learning , 2003, NIPS.

[17]  Mathieu Almeida,et al.  Dietary intervention impact on gut microbial gene richness , 2013, Nature.

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[19]  Adel Javanmard,et al.  Learning Linear Bayesian Networks with Latent Variables , 2012, ICML.

[20]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[21]  Chuong B Do,et al.  What is the expectation maximization algorithm? , 2008, Nature Biotechnology.

[22]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[23]  Ioannis Gkioulekas,et al.  Dimensionality Reduction Using the Sparse Linear Model , 2011, NIPS.

[24]  Michael I. Jordan,et al.  Consistent probabilistic outputs for protein function prediction , 2008, Genome Biology.

[25]  Giorgio Valentini,et al.  Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction , 2010, MLSB.

[26]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[28]  Suyash P. Awate,et al.  Hierarchical Graphical Models for Multigroup Shape Analysis using Expectation Maximization with Sampling in Kendall's Shape Space , 2012, 1212.5720.

[29]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[30]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[31]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[32]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[33]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[34]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..