Integrating transcription factor binding site information with gene expression datasets

MOTIVATION Microarrays are widely used to measure gene expression differences between sets of biological samples. Many of these differences will be due to differences in the activities of transcription factors. In principle, these differences can be detected by associating motifs in promoters with differences in gene expression levels between the groups. In practice, this is hard to do. RESULTS We combine correspondence analysis, between group analysis and co-inertia analysis to determine which motifs, from a database of promoter motifs, are strongly associated with differences in gene expression levels. Given a database of motifs and gene expression levels from a set of arrays, the method produces a ranked list of motifs associated with any specified split in the arrays. We give an example using the Gene Atlas compendium of gene expression levels for human tissues where we search for motifs that are associated with expression in central nervous system (CNS) or muscle tissues. Most of the motifs that we find are known from previous work to be strongly associated with expression in CNS or muscle. We give a second example using a published prostate cancer dataset where we can simply and clearly find which transcriptional pathways are associated with differences between benign and metastatic samples. AVAILABILITY The source code is freely available upon request from the authors.

[1]  P. Carbon,et al.  ZNF76 and ZNF143 Are Two Human Homologs of the Transcriptional Activator Staf* , 1998, The Journal of Biological Chemistry.

[2]  J D Siegal,et al.  Enhanced expression of the c‐myc protooncogene in high‐grade human prostate cancers , 1988, The Prostate.

[3]  D. Peehl,et al.  Tumor-suppression function of transcription factor USF2 in prostate carcinogenesis , 2006, Oncogene.

[4]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[5]  C. Fujiyama,et al.  Expression of hypoxia-inducible factor 1alpha in human normal, benign, and malignant prostate tissue. , 2003, Chinese medical journal.

[6]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[7]  Jiang Chang,et al.  Inhibitory Cardiac Transcription Factor, SRF-N, Is Generated by Caspase 3 Cleavage in Human Heart Failure and Attenuated by Ventricular Unloading , 2003, Circulation.

[8]  Guy Perrière,et al.  Between-group analysis of microarray data , 2002, Bioinform..

[9]  J. Mohler,et al.  Association of prostate cancer with vitamin D receptor gene polymorphism. , 1996, Cancer research.

[10]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Dongmei Xiao,et al.  GRP receptor-mediated immediate early gene expression and transcription factor Elk-1 activation in prostate cancer cells , 2002, Regulatory Peptides.

[12]  Shuqiu Zheng,et al.  The Transcription Factor Regulatory Factor X1 Increases the Expression of Neuronal Glutamate Transporter Type 3* , 2006, Journal of Biological Chemistry.

[13]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[15]  Tomoko Tahira,et al.  Characterization of the biological functions of a transcription factor, c-myc intron binding protein 1 (MIBP1). , 2002, Journal of biochemistry.

[16]  Suresh Karanam,et al.  CONFAC: automated application of comparative genomic promoter analysis to DNA microarray datasets , 2004, Nucleic Acids Res..

[17]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[18]  A. Höskuldsson PLS regression methods , 1988 .

[19]  J. Fransen,et al.  Colocalisation of the protein tyrosine phosphatases PTP-SL and PTPBR7 with β4-adaptin in neuronal cells , 2002, Histochemistry and Cell Biology.

[20]  B. De Moor,et al.  Toucan: deciphering the cis-regulatory logic of coregulated genes. , 2003, Nucleic acids research.

[21]  S. Dolédec,et al.  Co‐inertia analysis: an alternative method for studying species–environment relationships , 1994 .

[22]  D C Utz,et al.  Androgen receptor binding activity in human prostate cancer , 1985, Cancer.

[23]  A Kumar,et al.  MyoD transactivates angiotensinogen promoter in fibroblast C3H10T1/2 cells. , 1993, Cellular & molecular biology research.

[24]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[25]  R. Y. Tsai,et al.  Cloning and Functional Characterization of Roaz, a Zinc Finger Protein that Interacts with O/E-1 to Regulate Gene Expression: Implications for Olfactory Neuronal Development , 1997, The Journal of Neuroscience.

[26]  C. N. Coleman,et al.  Constitutive activation of IκB kinase α and NF-κB in prostate cancer cells is inhibited by ibuprofen , 1999, Oncogene.

[27]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[28]  L. Recht,et al.  High-resolution genome-wide mapping of genetic alterations in human glial brain tumors. , 2005, Cancer research.

[29]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[30]  R. Agarwal,et al.  Impairment of erbB1 receptor and fluid-phase endocytosis and associated mitogenic signaling by inositol hexaphosphate in human prostate carcinoma DU145 cells. , 2000, Carcinogenesis.

[31]  C. N. Coleman,et al.  Constitutive activation of IkappaB kinase alpha and NF-kappaB in prostate cancer cells is inhibited by ibuprofen. , 1999, Oncogene.

[32]  T. Hasan,et al.  p53 expression and clinical outcome in prostate cancer. , 1993, British journal of urology.

[33]  Rosalind Eeles,et al.  Transcription factor E2F3 overexpressed in prostate cancer independently predicts clinical outcome , 2004, Oncogene.

[34]  John T. Wei,et al.  Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression. , 2005, Cancer cell.

[35]  C. Murre,et al.  Localization of Pbx1 transcripts in developing rat embryos , 1995, Mechanisms of Development.

[36]  E E Büllesbach,et al.  Specific, High Affinity Relaxin-like Factor Receptors* , 1999, The Journal of Biological Chemistry.

[37]  R. Treisman,et al.  Human SRF-related proteins: DNA-binding properties and potential regulatory targets. , 1991, Genes & development.

[38]  J. Hoheisel,et al.  Correspondence analysis applied to microarray data , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Webb Miller,et al.  PipTools: a computational toolkit to annotate and analyze pairwise comparisons of genomic sequences. , 2002, Genomics.

[40]  P Gruss,et al.  Pax: gene regulators in the developing nervous system. , 1993, Journal of neurobiology.

[41]  S. Krane,et al.  Specific high-affinity receptors for 1,25-dihydroxyvitamin D3 in human peripheral blood mononuclear cells: presence in monocytes and induction in T lymphocytes following activation. , 1983, The Journal of clinical endocrinology and metabolism.

[42]  Thomas Lemberger,et al.  SRF mediates activity-induced gene expression and synaptic plasticity but not neuronal viability , 2005, Nature Neuroscience.

[43]  Daniel Chessel,et al.  Rythmes saisonniers et composantes stationnelles en milieu aquatique. I: Description d'un plan d'observation complet par projection de variables , 1987 .

[44]  Thomas Braun,et al.  VITO-1 is an essential cofactor of TEF1-dependent muscle-specific gene regulation. , 2004, Nucleic acids research.

[45]  A. Fernandez,et al.  Serum response factor p67SRF is expressed and required during myogenic differentiation of both mouse C2 and rat L6 muscle cell lines , 1992, The Journal of cell biology.

[46]  David J. Arenillas,et al.  oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes , 2005, Nucleic acids research.

[47]  Quang-Dé Nguyen,et al.  G-protein αolf subunit promotes cellular invasion, survival, and neuroendocrine differentiation in digestive and urogenital epithelial cells , 2002, Oncogene.

[48]  Chih-Pin Chuu,et al.  Antiproliferative Effect of Liver X Receptor Agonists on LNCaP Human Prostate Cancer Cells , 2004, Cancer Research.

[49]  Theresia Thalhammer,et al.  Expression of the aryl hydrocarbon receptor (AhR) and the aryl hydrocarbon receptor nuclear translocator (ARNT) in fetal, benign hyperplastic, and malignant prostate , 1998, The Prostate.

[50]  D J Anderson,et al.  The neuron-restrictive silencer factor (NRSF): a coordinate repressor of multiple neuron-specific genes , 1995, Science.

[51]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[52]  K. Calame,et al.  The ZiN/POZ domain of ZF5 is required for both transcriptional activation and repression. , 1997, Nucleic acids research.

[53]  RAINER BREITLING,et al.  Rank-based Methods as a Non-parametric Alternative of the T-statistic for the Analysis of Biological Microarray Data , 2005, J. Bioinform. Comput. Biol..

[54]  Jean Thioulouse,et al.  ADE-4: a multivariate analysis and graphical display software , 1997, Stat. Comput..

[55]  Guy Perrière,et al.  MADE4: an R package for multivariate analysis of gene expression data , 2005, Bioinform..

[56]  Guy Perrière,et al.  Cross-platform comparison and visualisation of gene expression data using co-inertia analysis , 2003, BMC Bioinformatics.