Motif-guided sparse decomposition of gene expression data for regulatory module identification

BackgroundGenes work coordinately as gene modules or gene networks. Various computational approaches have been proposed to find gene modules based on gene expression data; for example, gene clustering is a popular method for grouping genes with similar gene expression patterns. However, traditional gene clustering often yields unsatisfactory results for regulatory module identification because the resulting gene clusters are co-expressed but not necessarily co-regulated.ResultsWe propose a novel approach, motif-guided sparse decomposition (mSD), to identify gene regulatory modules by integrating gene expression data and DNA sequence motif information. The mSD approach is implemented as a two-step algorithm comprising estimates of (1) transcription factor activity and (2) the strength of the predicted gene regulation event(s). Specifically, a motif-guided clustering method is first developed to estimate the transcription factor activity of a gene module; sparse component analysis is then applied to estimate the regulation strength, and so predict the target genes of the transcription factors. The mSD approach was first tested for its improved performance in finding regulatory modules using simulated and real yeast data, revealing functionally distinct gene modules enriched with biologically validated transcription factors. We then demonstrated the efficacy of the mSD approach on breast cancer cell line data and uncovered several important gene regulatory modules related to endocrine therapy of breast cancer.ConclusionWe have developed a new integrated strategy, namely motif-guided sparse decomposition (mSD) of gene expression data, for regulatory module identification. The mSD method features a novel motif-guided clustering method for transcription factor activity estimation by finding a balance between co-regulation and co-expression. The mSD method further utilizes a sparse decomposition method for regulation strength estimation. The experimental results show that such a motif-guided strategy can provide context-specific regulatory modules in both yeast and breast cancer studies.

[1]  Armin Shmilovici,et al.  Identification of transcription factor binding sites with variable-order Bayesian networks , 2005, Bioinform..

[2]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[3]  John D. Haley,et al.  EGFR signaling networks in cancer therapy , 2008 .

[4]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[5]  David P. LeBrun,et al.  Chimeric homeobox gene E2A-PBX1 induces proliferation, apoptosis, and malignant lymphomas in transgenic mice , 1993, Cell.

[6]  Michael Q. Zhang,et al.  Integrative bioinformatics analysis of transcriptional regulatory programs in breast cancer cells , 2008, BMC Bioinformatics.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[9]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Arul M Chinnaiyan,et al.  Genes regulated by estrogen in breast tumor cells in vitro are similarly regulated in vivo in tumor xenografts and human breast tumors , 2006, Genome Biology.

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[13]  F. C. Lucibello,et al.  Cell cycle regulation of the cyclin A, cdc25C and cdc2 genes is based on a common mechanism of transcriptional repression. , 1995, The EMBO journal.

[14]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[15]  Byoung-Tak Zhang,et al.  Identification of regulatory modules by co-clustering latent variable models: stem cell differentiation , 2006, Bioinform..

[16]  L. Tanoue Cancer Statistics, 2009 , 2010 .

[17]  Elizabeth Neame,et al.  Gene networks: Network analysis gets dynamic , 2008, Nature Reviews Genetics.

[18]  Javier Arroyo,et al.  The Global Transcriptional Response to Transient Cell Wall Damage in Saccharomyces cerevisiae and Its Regulation by the Cell Integrity Signaling Pathway* , 2004, Journal of Biological Chemistry.

[19]  Andrew G. Clark,et al.  Genomic Analyses of Transcription Factor Binding, Histone Acetylation, and Gene Expression Reveal Mechanistically Distinct Classes of Estrogen-Regulated Promoters , 2007, Molecular and Cellular Biology.

[20]  D S Latchman,et al.  Transcription factors as potential targets for therapeutic drugs. , 2000, Current pharmaceutical biotechnology.

[21]  Robert Clarke,et al.  Physical and functional interactions between Cas and c-Src induce tamoxifen resistance of breast cancer cells through pathways involving epidermal growth factor receptor and signal transducer and activator of transcription 5b. , 2006, Cancer research.

[22]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[24]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[25]  N. Brünner,et al.  Association of interferon regulatory factor-1, nucleophosmin, nuclear factor-kappaB, and cyclic AMP response element binding with acquired resistance to Faslodex (ICI 182,780). , 2002, Cancer research.

[26]  C. Der,et al.  Aberrant function of the Ras signal transduction pathway in human breast cancer , 1995, Breast Cancer Research and Treatment.

[27]  Marcel J. T. Reinders,et al.  Integration of Known Transcription Factor Binding Site Information and Gene Expression Data to Advance from Co-Expression to Co-Regulation , 2007, Genom. Proteom. Bioinform..

[28]  BMC Bioinformatics , 2005 .

[29]  G. Marsaglia,et al.  Evaluating Kolmogorov's distribution , 2003 .

[30]  Christian J Stoeckert,et al.  Clustering of genes into regulons using integrated modeling-COGRIM , 2007, Genome Biology.

[31]  I. Pastan,et al.  A transcription factor active on the epidermal growth factor receptor gene. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[32]  M. Meyerson,et al.  Signal transducer and activator of transcription 3 is required for the oncogenic effects of non-small-cell lung cancer-associated mutations of the epidermal growth factor receptor. , 2006, Cancer research.

[33]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[34]  Chiara Sabatti,et al.  Bayesian sparse hidden components analysis for transcription regulation networks , 2005, Bioinform..

[35]  Martin D. Levine,et al.  Dynamic Measurement of Computer Generated Image Segmentations , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Alan Wells,et al.  Multiple signaling pathways mediate compaction of collagen matrices by EGF-stimulated fibroblasts. , 2006, Experimental cell research.

[37]  Y. Yarden,et al.  Untangling the ErbB signalling network , 2001, Nature Reviews Molecular Cell Biology.

[38]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[39]  M. Pagano,et al.  Differential modulation of cyclin gene expression by MYC. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[40]  G. Orphanides,et al.  Estrogen receptors: orchestrators of pleiotropic cellular responses , 2001, EMBO reports.

[41]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[42]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[43]  Nir Friedman,et al.  Inferring quantitative models of regulatory networks from expression data , 2004, ISMB/ECCB.

[44]  Chien-Cheng Chen,et al.  Egr‐1 is activated by 17β‐estradiol in MCF‐7 cells by mitogen‐activated protein kinase‐dependent phosphorylation of ELK‐1 , 2004, Journal of cellular biochemistry.

[45]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[46]  James C Liao,et al.  Inferring yeast cell cycle regulators and interactions using transcription factor activities , 2005, BMC Genomics.

[47]  J. Brugge,et al.  Controlled Dimerization of ErbB Receptors Provides Evidence for Differential Signaling by Homo- and Heterodimers , 1999, Molecular and Cellular Biology.

[48]  J. Robertson,et al.  Involvement of steroid hormone and growth factor cross-talk in endocrine response in breast cancer. , 1999, Endocrine-related cancer.

[49]  T. Crook,et al.  The p53 pathway in breast cancer , 2002, Breast Cancer Research.

[50]  W. Wong,et al.  Functional annotation and network reconstruction through cross-platform integration of microarray data , 2005, Nature Biotechnology.

[51]  Robert L. Sutherland,et al.  Biological determinants of endocrine resistance in breast cancer , 2009, Nature Reviews Cancer.

[52]  Robert Clarke,et al.  Physical and Functional Interactions between Cas and cSrc Induce Tamoxifen Resistance of Breast Cancer Cells through Pathways Involving Epidermal Growth Factor Receptor and Signal Transducer and Activator of Transcription 5 b , 2006 .

[53]  W. Huh,et al.  High-resolution analysis of condition-specific regulatory modules in Saccharomyces cerevisiae , 2008, Genome Biology.

[54]  Minetta C. Liu,et al.  Antiestrogen resistance in breast cancer and the role of estrogen receptor signaling , 2003, Oncogene.

[55]  Zhi Ding,et al.  Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data , 2008, Bioinform..

[56]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[57]  Maria Sjöberg,et al.  Mechanisms of estrogen receptor signaling: convergence of genomic and nongenomic actions on target genes. , 2005, Molecular endocrinology.

[58]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[59]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[60]  Fabian J. Theis,et al.  Sparse component analysis and blind source separation of underdetermined mixtures , 2005, IEEE Transactions on Neural Networks.

[61]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[63]  Lorenzo Ferraro,et al.  Estrogen receptor alpha controls a gene network in luminal-like breast cancer cells comprising multiple transcription factors and microRNAs. , 2010, The American journal of pathology.

[64]  Christian Jutten,et al.  A Fast Method for Sparse Component Analysis Based on Iterative Detection‐Estimation , 2006 .

[65]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[66]  Robert Clarke,et al.  Gene network signaling in hormone responsiveness modifies apoptosis and autophagy in breast cancer cells , 2009, The Journal of Steroid Biochemistry and Molecular Biology.

[67]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[68]  C. Watson,et al.  The Jak/Stat Pathway: A Novel Way to Regulate PI3K Activity , 2005, Cell cycle.

[69]  Allen Chong,et al.  Discovery of estrogen receptor α target genes and response elements in breast tumor cells , 2004, Genome Biology.

[70]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[71]  S. Squazzo,et al.  A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data--a case study using E2F1. , 2006, Genome research.

[72]  Heike Brand,et al.  Multiple mechanisms induce transcriptional silencing of a subset of genes, including oestrogen receptor α, in response to deacetylase inhibition by valproic acid and trichostatin A , 2005, Oncogene.

[73]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[74]  Clifford A. Meyer,et al.  Chromosome-Wide Mapping of Estrogen Receptor Binding Reveals Long-Range Regulation Requiring the Forkhead Protein FoxA1 , 2005, Cell.

[75]  Robert Clarke,et al.  Knowledge-guided multi-scale independent component analysis for biomarker identification , 2008, BMC Bioinformatics.

[76]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[78]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..