Distinguishing between Genomic Regions Bound by Paralogous Transcription Factors

Transcription factors (TFs) regulate gene expression by binding to specific DNA sites in cis regulatory regions of genes. Most eukaryotic TFs are members of protein families that share a common DNA binding domain and often recognize highly similar DNA sequences. Currently, it is not well understood why closely related TFs are able to bind different genomic regions in vivo, despite having the potential to interact with the same DNA sites. Here, we use the Myc/Max/Mad family as a model system to investigate whether interactions with additional proteins (co-factors) can explain why paralogous TFs with highly similar DNA binding preferences interact with different genomic sites in vivo. We use a classification approach to distinguish between targets of c-Myc versus Mad2, using features that reflect the DNA binding specificities of putative co-factors. When applied to c-Myc/Mad2 DNA binding data, our algorithm can distinguish between genomic regions bound uniquely by c-Myc versus Mad2 with 87% accuracy.

[1]  H. Jockusch,et al.  The human gene ZFP161 on 18p11.21-pter encodes a putative c-myc repressor and is homologous to murine Zfp161 (Chr 17) and Zfp161-rs1 (X Chr) , 1997, Genomics.

[2]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[5]  J. van Helden,et al.  RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets , 2011, Nucleic acids research.

[6]  R. Eisenman,et al.  The Myc/Max/Mad network and the transcriptional control of cell behavior. , 2000, Annual review of cell and developmental biology.

[7]  Victor X Jin,et al.  A comprehensive ChIP-chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells reveals interchangeable roles of E2F family members. , 2007, Genome research.

[8]  A. Capobianco,et al.  Sp100 as a potent tumor suppressor: accelerated senescence and rapid malignant transformation of human fibroblasts through modulation of an embryonic stem cell program. , 2010, Cancer research.

[9]  W. D. Cress,et al.  Subunit composition determines E2F DNA-binding site specificity , 1997, Molecular and cellular biology.

[10]  L. Kretzner,et al.  Autorepression of c-myc requires both initiator and E2F-binding site elements and cooperation with the p107 gene product , 2004, Oncogene.

[11]  Martin C. Frith,et al.  Inferring transcription factor complexes from ChIP-seq data , 2011, Nucleic acids research.

[12]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[13]  Martha L. Bulyk,et al.  UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein–DNA interactions , 2010, Nucleic Acids Res..

[14]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[15]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[16]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[17]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[18]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[19]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[20]  Raluca Gordân,et al.  Distinguishing direct versus indirect transcription factor-DNA interactions. , 2009, Genome research.

[21]  Gary D. Stormo,et al.  enoLOGOS: a versatile web tool for energy normalized sequence logos , 2005, Nucleic Acids Res..

[22]  Philip Machanick,et al.  MEME-ChIP: motif analysis of large DNA datasets , 2011, Bioinform..

[23]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[24]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[25]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[26]  Qiang Yu,et al.  The E2F family and the role of E2F1 in apoptosis. , 2009, The international journal of biochemistry & cell biology.

[27]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[28]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[29]  Ting Wang,et al.  ENCODE whole-genome data in the UCSC Genome Browser , 2009, Nucleic Acids Res..

[30]  Andrew R. Gehrke,et al.  Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo , 2010, The EMBO journal.

[31]  P. Farnham,et al.  The identification of E2F1-specific target genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Peter C. Hollenhorst,et al.  Genome-wide analyses reveal properties of redundant and specific promoter occupancy within the ETS gene family. , 2007, Genes & development.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Qing Zhou,et al.  Searching ChIP-seq genomic islands for combinatorial regulatory codes in mouse embryonic stem cells , 2011, BMC Genomics.

[35]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Gareth A. Palidwor,et al.  Transcriptional dominance of Pax7 in adult myogenesis is due to high-affinity recognition of homeodomain motifs. , 2012, Developmental cell.

[38]  P. Farnham Insights from genomic profiling of transcription factors , 2009, Nature Reviews Genetics.