Identifying multi-layer gene regulatory modules from multi-dimensional genomic data

Motivation: Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse. Results: In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local ‘gene expression factory’. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes. Availability and implementation: The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/. Contact: xjzhou@usc.edu Supplementary information: Supplementary material are available at Bioinformatics online.

[1]  A. Horwitz,et al.  BRCA1 represses amphiregulin gene expression. , 2010, Cancer research.

[2]  Avri Ben-Ze'ev,et al.  Novel expression of N-cadherin elicits in vitro bladder cell invasion via the Akt signaling pathway , 2004, Oncogene.

[3]  Alexander Schliep,et al.  Inferring differentiation pathways from gene expression , 2008, ISMB.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Feng Gao,et al.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data , 2004, BMC Bioinformatics.

[6]  G. Orphanides,et al.  A Unified Theory of Gene Expression , 2002, Cell.

[7]  William S. Rayens,et al.  PLS and dimension reduction for classification , 2007, Comput. Stat..

[8]  Jae Hoon Kim,et al.  MicroRNA Expression Profiles in Serous Ovarian Carcinoma , 2008, Clinical Cancer Research.

[9]  Wenjun Cheng,et al.  Lineage infidelity of epithelial ovarian cancers is controlled by HOX genes that specify regional identity in the reproductive tract , 2005, Nature Medicine.

[10]  Michel Tenenhaus,et al.  PLS path modeling , 2005, Comput. Stat. Data Anal..

[11]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[12]  L. E. Wangen,et al.  A multiblock partial least squares algorithm for investigating complex chemical systems , 1989 .

[13]  Yi Zhao,et al.  Clustered microRNAs' coordination in regulating protein-protein interaction network , 2009, BMC Systems Biology.

[14]  Takayo Ota,et al.  Expression and function of HOXA genes in normal and neoplastic ovarian epithelial cells. , 2009, Differentiation; research in biological diversity.

[15]  Wei Zhang,et al.  A Bayesian Partition Method for Detecting Pleiotropic and Epistatic eQTL Modules , 2010, PLoS Comput. Biol..

[16]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[17]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[18]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[19]  T. Maniatis,et al.  An extensive network of coupling among gene expression machines , 2002, Nature.

[20]  F. Bookstein,et al.  Two Structural Equation Models: LISREL and PLS Applied to Consumer Exit-Voice Theory , 1982 .

[21]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[22]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[23]  G. Pfeifer,et al.  Involvement of the RASSF1A tumor suppressor gene in controlling cell migration. , 2005, Cancer research.

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Qinghua Wu,et al.  DNA methylation profiling of ovarian carcinomas and their in vitro models identifies HOXA9, HOXB5, SCGB3A1, and CRABP1 as novel targets , 2007, Molecular Cancer.

[26]  C. Croce,et al.  MicroRNA signatures in human ovarian cancer. , 2007, Cancer research.

[27]  Kayla E. Smith,et al.  The ENCODE Project at UC Santa Cruz , 2006, Nucleic Acids Res..

[28]  Wei Niu,et al.  Construction and Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data , 2011, PLoS Comput. Biol..

[29]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[30]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[31]  George Stephanopoulos,et al.  Inverse modeling using multi-block PLS to determine the environmental conditions that provide optimal cellular function , 2004, Bioinform..

[32]  J. Mesirov,et al.  Metagene projection for cross-platform, cross-species characterization of global transcriptional states , 2007, Proceedings of the National Academy of Sciences.

[33]  Sven Bergmann,et al.  A modular approach for integrative analysis of large-scale gene-expression and drug-response data , 2008, Nature Biotechnology.

[34]  G. Hammer,et al.  Wilms' Tumor 1 and Dax-1 Modulate the Orphan Nuclear Receptor SF-1 in Sex-Specific Gene Expression , 1998, Cell.

[35]  F. Spinella,et al.  Endothelin B Receptor Blockade Inhibits Dynamics of Cell Interactions and Communications in Melanoma Cell Progression , 2004, Cancer Research.

[36]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[37]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[38]  Igor Pogribny,et al.  Small molecules with big effects: the role of the microRNAome in cancer and carcinogenesis. , 2011, Mutation research.

[39]  G. Golub,et al.  A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies , 2007, Proceedings of the National Academy of Sciences.

[40]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[41]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[42]  Richard Bonneau,et al.  Multi-species integrative biclustering , 2010, Genome Biology.

[43]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[44]  Johan A. K. Suykens,et al.  L2-norm multiple kernel learning and its application to biomedical data fusion , 2010, BMC Bioinformatics.

[45]  Amit Maity,et al.  PTEN mutation and epidermal growth factor receptor activation regulate vascular endothelial growth factor (VEGF) mRNA expression in human glioblastoma cells by transactivating the proximal VEGF promoter. , 2003, Cancer research.

[46]  Jemila S. Hamid,et al.  Weighted kernel Fisher discriminant analysis for integrating heterogeneous data , 2012, Comput. Stat. Data Anal..

[47]  M. Moore From Birth to Death: The Complex Lives of Eukaryotic mRNAs , 2005, Science.

[48]  Christina Chan,et al.  Systems biology for identifying liver toxicity pathways , 2009, BMC proceedings.

[49]  Martin Widschwendter,et al.  HOXA methylation in normal endometrium from premenopausal women is associated with the presence of ovarian cancer: A proof of principle study , 2009, International journal of cancer.

[50]  Tongbin Li,et al.  miRecords: an integrated resource for microRNA–target interactions , 2008, Nucleic Acids Res..

[51]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[52]  P. D. Smith,et al.  Cytotoxic response of ovarian cancer cell lines to IFN-γ is associated with sustained induction of IRF-1 and p21 mRNA , 1999, British Journal of Cancer.

[53]  Francesco Novelli,et al.  STAT1 and STAT3 in Tumorigenesis: Two Sides of the Same Coin? , 2013 .

[54]  Sally A Camper,et al.  Pitx2 is required at multiple stages of pituitary organogenesis: pituitary primordium formation and cell specification. , 2002, Development.

[55]  Gordon B Mills,et al.  Loss of trimethylation at lysine 27 of histone H3 is a predictor of poor outcome in breast, ovarian, and pancreatic cancers , 2008, Molecular carcinogenesis.

[56]  Chris Sander,et al.  Time to Recurrence and Survival in Serous Ovarian Tumors Predicted from Integrated Genomic Profiles , 2011, PloS one.

[57]  Andrew K Godwin,et al.  AKT and mTOR phosphorylation is frequently detected in ovarian cancer and can be targeted to disrupt ovarian tumor cell growth , 2004, Oncogene.

[58]  Huan Yang,et al.  MicroRNA expression profiling in human ovarian cancer: miR-214 induces cell survival and cisplatin resistance by targeting PTEN. , 2008, Cancer research.

[59]  J. Friedman Fast sparse regression and classification , 2012 .

[60]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[61]  D. Bartel,et al.  Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. , 2005, RNA.

[62]  Domenico Coppola,et al.  Frequent activation of AKT2 and induction of apoptosis by inhibition of phosphoinositide-3-OH kinase/Akt pathway in human ovarian cancer , 2000, Oncogene.