Analysis of Co-Associated Transcription Factors via Ordered Adjacency Differences on Motif Distribution

Transcription factors (TFs) binding to specific DNA sequences or motifs, are elementary to the regulation of transcription. The gene is regulated by a combination of TFs in close proximity. Analysis of co-TFs is an important problem in understanding the mechanism of transcriptional regulation. Recently, ChIP-seq in mapping TF provides a large amount of experimental data to analyze co-TFs. Several studies show that if two TFs are co-associated, the relative distance between TFs exhibits a peak-like distribution. In order to analyze co-TFs, we develop a novel method to evaluate the associated situation between TFs. We design an adjacency score based on ordered differences, which can illustrate co-TF binding affinities for motif analysis. For all candidate motifs, we calculate corresponding adjacency scores, and then list descending-order motifs. From these lists, we can find co-TFs for candidate motifs. On ChIP-seq datasets, our method obtains best AUC results on five datasets, 0.9432 for NMYC, 0.9109 for KLF4, 0.9006 for ZFX, 0.8892 for ESRRB, 0.8920 for E2F1. Our method has great stability on large sample datasets. AUC results of our method on all datasets are above 0.8.

[1]  Claudia Angelini,et al.  Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems , 2014, Front. Cell Dev. Biol..

[2]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[3]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[4]  Yixue Li,et al.  Transcriptional regulation and spatial interactions of head-to-head genes , 2014, BMC Genomics.

[5]  M. Downes,et al.  Activation of myoD gene transcription by 3,5,3'-triiodo-L-thyronine: a direct role for the thyroid hormone and retinoid X receptors. , 1994, Nucleic acids research.

[6]  Clifford A. Meyer,et al.  Nucleosome Dynamics Define Transcriptional Enhancers , 2010, Nature Genetics.

[7]  Martin C. Frith,et al.  Inferring transcription factor complexes from ChIP-seq data , 2011, Nucleic acids research.

[8]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[9]  Jun Song,et al.  CEAS: cis-regulatory element annotation system , 2006, Nucleic Acids Res..

[10]  Armando J. Pinho,et al.  DNA synthetic sequences generation using multiple competing Markov models , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[11]  E. Furlong,et al.  Transcription factors: from enhancer binding to developmental control , 2012, Nature Reviews Genetics.

[12]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[13]  Jeffrey Chang,et al.  Biopython: Python tools for computational biology , 2000, SIGB.

[14]  Finn Drabløs,et al.  Detecting periodic patterns in biological sequences , 1998, Bioinform..

[15]  K. White,et al.  ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis , 2011, BMC Genomics.

[16]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[17]  Christian A. Grove,et al.  A Gene-Centered C. elegans Protein-DNA Interaction Network , 2006, Cell.

[18]  Peter A. C. 't Hoen,et al.  CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes , 2008, BMC Bioinformatics.

[19]  Claudio Moraga,et al.  The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning , 1995, IWANN.

[20]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[21]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[22]  Xiaoman Li,et al.  Transcriptional regulation of co-expressed microRNA target genes. , 2011, Genomics.

[23]  E. Cheung,et al.  Genomic analyses of hormone signaling and gene regulation. , 2010, Annual review of physiology.

[24]  Wing-Kin Sung,et al.  CENTDIST: discovery of co-associated factors by motif distribution , 2011, Nucleic Acids Res..

[25]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[26]  K. Bretonnel Cohen,et al.  Themes in biomedical natural language processing: BioNLP08 , 2008, BMC Bioinformatics.

[27]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[28]  Saurabh Sinha,et al.  On counting position weight matrix matches in a sequence, with application to discriminative motif finding , 2006, ISMB.

[29]  Martin Vingron,et al.  PASTAA: identifying transcription factors associated with sets of co-regulated genes , 2008, Bioinform..

[30]  D. Latchman Transcription factors: an overview. , 1997, The international journal of biochemistry & cell biology.

[31]  Andreas Wagner,et al.  Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes , 1999, Bioinform..

[32]  Jinze Liu,et al.  Analysis of equine protein-coding gene structure and expression by RNA-sequencing , 2010, BMC Bioinformatics.

[33]  M. Huss,et al.  Q&A: ChIP-seq technologies and the study of gene regulation , 2010, BMC Biology.

[34]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[35]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[36]  Anthony D Whetton,et al.  THOC5/FMIP, an mRNA export TREX complex protein, is essential for hematopoietic primitive cell survival in vivo , 2010, BMC Biology.

[37]  Daniel A. Henderson,et al.  Fitting Markov chain models to discrete state series such as DNA sequences , 1999 .

[38]  Timothy L. Bailey,et al.  Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data , 2010, BMC Bioinformatics.

[39]  C. Myers,et al.  A gene‐centered C. elegans protein–DNA interaction network provides a framework for functional predictions , 2016, Molecular systems biology.

[40]  G. Marsaglia The squeeze method for generating gamma variates , 1977 .

[41]  Pieter J. De Bleser,et al.  ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species , 2008, Nucleic Acids Res..