Asymmetric latent semantic indexing for gene expression experiments visualization

We propose a new method to visualize gene expression experiments inspired by the latent semantic indexing technique originally proposed in the textual analysis context. By using the correspondence word-gene document-experiment, we define an asymmetric similarity measure of association for genes that accounts for potential hierarchies in the data, the key to obtain meaningful gene mappings. We use the polar decomposition to obtain the sources of asymmetry of the similarity matrix, which are later combined with previous knowledge. Genetic classes of genes are identified by means of a mixture model applied in the genes latent space. We describe the steps of the procedure and we show its utility in the Human Cancer dataset.

[1]  Naohito Chino,et al.  A GENERALIZED INNER PRODUCT MODEL FOR THE ANALYSIS OF ASYMMETRY , 1990 .

[2]  Tao Jiang,et al.  Differential gene expression analysis using coexpression and RNA-Seq data , 2013, 2013 IEEE 3rd International Conference on Computational Advances in Bio and medical Sciences (ICCABS).

[3]  Alessandro Perina,et al.  Expression microarray classification using topic models , 2010, SAC '10.

[4]  Uri Alon,et al.  Inferring biological tasks using Pareto analysis of high-dimensional data , 2015, Nature Methods.

[5]  Zexuan Zhu,et al.  Whole-Genome Functional Classification of Genes by Latent Semantic Analysis on Microarray Data , 2004, APBC.

[6]  Bart Kosko,et al.  Neural networks and fuzzy systems: a dynamical systems approach to machine intelligence , 1991 .

[7]  Ron Shamir,et al.  Scoring clustering solutions by their biological relevance , 2003, Bioinform..

[8]  Javier M. Moguerza,et al.  Methods for the combination of kernel matrices within a support vector framework , 2009, Machine Learning.

[9]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[10]  Javier M. Moguerza,et al.  Support Vector Machine Classifiers for Asymmetric Proximities , 2003, ICANN.

[11]  MuòozAlberto Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model , 1997 .

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Javier González,et al.  Functional analysis techniques to improve similarity matrices in discrimination problems , 2013, Journal of Multivariate Analysis.

[14]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[15]  Kotagiri Ramamohanarao,et al.  Kernel latent semantic analysis using an information retrieval based kernel , 2009, CIKM.

[16]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[17]  Alberto Muòoz,et al.  Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model , 1997 .

[18]  John C. Gower,et al.  Orthogonality and its approximation in the analysis of asymmetry , 1998 .

[19]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20]  Yin Liu,et al.  Incorporating prior knowledge into Gene Network Study , 2013, Bioinform..

[21]  Javier González,et al.  Representing functional data using support vector machines , 2008, Pattern Recognit. Lett..

[22]  Albert-László Barabási,et al.  The Architecture of Biological Networks , 2006 .

[23]  Yuval Hart,et al.  Geometry of the Gene Expression Space of Individual Cells , 2015, PLoS Comput. Biol..

[24]  Li Cai,et al.  Measuring similarities between gene expression profiles through new data transformations , 2007, BMC Bioinformatics.

[25]  Susan T. Dumais,et al.  Improving information retrieval using latent semantic indexing , 1988 .

[26]  N. Higham Computing the polar decomposition with applications , 1986 .

[27]  Samuel Kaski,et al.  Probabilistic retrieval and visualization of biologically relevant microarray experiments , 2009, BMC Bioinformatics.

[28]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[29]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[30]  Isaac Martín de Diego,et al.  Local Linear Approximation for Kernel Methods: The Railway Kernel , 2006, CIARP.

[31]  Javier González,et al.  HIERARCHICAL LATENT SEMANTIC CLASS EXTRACTION USING ASYMMETRIC TERM SIMILARITIES , 2012 .

[32]  Akinori Okada,et al.  Nonmetric Multidimensional Scaling of Asymmetric Proximities , 1987 .

[33]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[34]  Alberto Muñoz,et al.  Visualizing asymmetric proximities with SOM and MDS models , 2005, Neurocomputing.

[35]  Javier González,et al.  Joint Diagonalization of Kernels for Information Fusion , 2007, CIARP.

[36]  Alberto Muñoz,et al.  Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model , 1997, Intell. Data Anal..

[37]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[38]  Javier González,et al.  Functional Learning of Kernels for Information Fusion Purposes , 2008, CIARP.

[39]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[40]  I. J. Schoenberg Remarks to Maurice Frechet's Article ``Sur La Definition Axiomatique D'Une Classe D'Espace Distances Vectoriellement Applicable Sur L'Espace De Hilbert , 1935 .

[41]  Sean C. Bendall,et al.  viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia , 2013, Nature Biotechnology.

[42]  Naohito Chino,et al.  A GRAPHICAL TECHNIQUE FOR REPRESENTING THE ASYMMETRIC RELATIONSHIPS BETWEEN N OBJECTS , 1978 .

[43]  Naohito Chino,et al.  A BRIEF SURVEY OF ASYMMETRIC MDS AND SOME OPEN PROBLEMS , 2012 .

[44]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[45]  G. Wahba Spline models for observational data , 1990 .

[46]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[47]  Akinori Okada,et al.  A Generalization of Asymmetric Multidimensional Scaling , 1990 .