Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons

MOTIVATION Within any given tissue, gene expression levels can vary extensively among individuals. Such heterogeneity can be caused by genetic and epigenetic variability and may contribute to disease. The abundance of experimental data now enables the identification of features of gene expression profiles that are shared across tissues and those that are tissue-specific. While most current research is concerned with characterizing differential expression by comparing mean expression profiles across tissues, it is believed that a significant difference in a gene expression's variance across tissues may also be associated with molecular mechanisms that are important for tissue development and function. RESULTS We propose a sparse multi-view matrix factorization (sMVMF) algorithm to jointly analyse gene expression measurements in multiple tissues, where each tissue provides a different 'view' of the underlying organism. The proposed methodology can be interpreted as an extension of principal component analysis in that it provides the means to decompose the total sample variance in each tissue into the sum of two components: one capturing the variance that is shared across tissues and one isolating the tissue-specific variances. sMVMF has been used to jointly model mRNA expression profiles in three tissues obtained from a large and well-phenotyped twins cohort, TwinsUK. Using sMVMF, we are able to prioritize genes based on whether their variation patterns are specific to each tissue. Furthermore, using DNA methylation profiles available, we provide supporting evidence that adipose-specific gene expression patterns may be driven by epigenetic effects. AVAILABILITY AND IMPLEMENTATION Python code is available at http://wwwf.imperial.ac.uk/~gmontana/. CONTACT giovanni.montana@kcl.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Xiaoqin Yang,et al.  VeryGene: linking tissue-specific genes to diseases, drugs, and beyond for knowledge discovery. , 2011, Physiological genomics.

[2]  Sebastián M. Real,et al.  E2F1 Regulates Cellular Growth by mTORC1 Signaling , 2011, PloS one.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  A. Razin,et al.  DNA methylation and gene expression , 1991, Microbiological reviews.

[5]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[6]  G. Watts,et al.  Reduction in visceral adipose tissue is associated with improvement in apolipoprotein B-100 metabolism in obese men. , 1999, The Journal of clinical endocrinology and metabolism.

[7]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[8]  C. Fortier,et al.  Tissue-specific undermethylation of DNA sequences at the 5' end of the human apolipoprotein B gene. , 1989, The Journal of biological chemistry.

[9]  O. Alter,et al.  A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms , 2011, PloS one.

[10]  P. O’Reilly,et al.  Identification of seven loci affecting mean telomere length and their association with disease , 2013, Nature Genetics.

[11]  Christian Gieger,et al.  Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies , 2010, Nature Genetics.

[12]  W. Reik Stability and flexibility of epigenetic gene regulation in mammalian development , 2007, Nature.

[13]  Qinghua Zhou,et al.  The relationship between methylation of the Syk gene in the promoter region and the genesis of lung cancer. , 2010, Clinical laboratory.

[14]  M. Brattsand,et al.  Purification, Molecular Cloning, and Expression of a Human Stratum Corneum Trypsin-like Serine Protease with Possible Function in Desquamation* , 1999, The Journal of Biological Chemistry.

[15]  A. Blecha,et al.  A new infrared spectral component of the quasar 3C273 , 1986, Nature.

[16]  Peter Krieg,et al.  Development of an ichthyosiform phenotype in Alox12b-deficient mouse skin transplants. , 2009, The Journal of investigative dermatology.

[17]  Simon C. Potter,et al.  Mapping cis- and trans-regulatory effects across multiple tissues in twins , 2012, Nature Genetics.

[18]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[19]  Leopold Parts,et al.  Gene expression changes with age in skin, adipose tissue, blood and brain , 2013, Genome Biology.

[20]  Enrico Petretto,et al.  Multi-tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules , 2014, PLoS genetics.

[21]  Elliott Kieff,et al.  RNAs induced by Epstein-Barr virus nuclear antigen 2 in lymphoblastoid cell lines. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[22]  L Sie,et al.  Utility of lymphoblastoid cell lines , 2009, Journal of neuroscience research.

[23]  Gary D Bader,et al.  A travel guide to Cytoscape plugins , 2012, Nature Methods.

[24]  C. V. Jongeneel,et al.  An atlas of human gene expression from massively parallel signature sequencing (MPSS). , 2005, Genome research.

[25]  A. Coulon,et al.  Eukaryotic transcriptional dynamics: from single molecules to cell populations , 2013, Nature Reviews Genetics.

[26]  Zhonghuai Xiang,et al.  Microarray-based gene expression profiles in multiple tissues of the domesticated silkworm, Bombyx mori , 2007, Genome Biology.

[27]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[28]  Simon C. Potter,et al.  The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study , 2011, PLoS genetics.

[29]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[30]  A. Gnirke,et al.  Charting a dynamic DNA methylation landscape of the human genome , 2013, Nature.

[31]  R. Mahley,et al.  Complete protein sequence and identification of structural domains of human apolipoprotein B , 1986, Nature.

[32]  Herbert Tilg,et al.  Adipocytokines: mediators linking adipose tissue, inflammation and immunity , 2006, Nature Reviews Immunology.

[33]  R. Spielman,et al.  Natural variation in human gene expression assessed in lymphoblastoid cells , 2003, Nature Genetics.

[34]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[35]  J. Gastwirth,et al.  The impact of Levene’s test of equality of variances on statistical theory and practice , 2009, 1010.0308.

[36]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[37]  John Quackenbush,et al.  Variance of Gene Expression Identifies Altered Network Constraints in Neurological Disease , 2011, PLoS genetics.

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  Robin M. Murray,et al.  Epigenome-Wide Scans Identify Differentially Methylated Regions for Age and Age-Related Phenotypes in a Healthy Ageing Population , 2012, PLoS genetics.

[40]  Chunxiao Wu,et al.  Combinatorial control of suicide gene expression by tissue-specific promoter and microRNA regulation for cancer therapy. , 2009, Molecular therapy : the journal of the American Society of Gene Therapy.

[41]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[42]  Axel Visel,et al.  Tissue-Specific RNA Expression Marks Distant-Acting Developmental Enhancers , 2014, PLoS genetics.

[43]  Michael A. Charleston,et al.  Differential variability analysis of gene expression and its application to human diseases , 2008, ISMB.

[44]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[45]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[46]  T Pawson,et al.  Role of Syk in B-cell development and antigen-receptor signaling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[47]  V. Corces,et al.  Enhancer function: new insights into the regulation of tissue-specific gene expression , 2011, Nature Reviews Genetics.

[48]  Panos Deloukas,et al.  Epigenome-Wide DNA Methylation in Hearing Ability: New Mechanisms for an Old Problem , 2014, PloS one.

[49]  Yu Liu,et al.  Gene Expression Variability within and between Human Populations and Implications toward Disease Susceptibility , 2010, PLoS Comput. Biol..

[50]  Francisco S. Roque,et al.  A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes , 2008, Proceedings of the National Academy of Sciences.

[51]  Alireza Moayyeri,et al.  COHORT PROFILE Cohort Profile : TwinsUK and Healthy Ageing Twin Study , 2013 .

[52]  G. Pfeifer,et al.  Reduced expression and increased CpG dinucleotide methylation of the rat APOBEC-1 promoter in transgenic rabbits. , 2002, Biochimica et biophysica acta.

[53]  Raivo Kolde,et al.  DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns , 2014, Genome Biology.

[54]  T. Spector,et al.  Glycans Are a Novel Biomarker of Chronological and Biological Ages , 2013, The journals of gerontology. Series A, Biological sciences and medical sciences.