Identification of Transcription Factor Binding Sites in Promoter Regions by Modularity Analysis of the Motif Co-occurrence Graph

Many algorithms have been proposed to date for the problemof finding biologically significant motifs in promoter regions. They can beclassified into two large families: combinatorial methods and probabilisticmethods. Probabilistic methods have been used more extensively, sincetheir output is easier to interpret. Combinatorial methods have the potentialto identify hard to detect motifs, but their output is much harderto interpret, since it may consist of hundreds or thousands of motifs.In this work, we propose a method that processes the output of combinatorialmotif finders in order to find groups of motifs that representvariations of the same motif, thus reducing the output to a manageablesize. This processing is done by building a graph that represents the cooccurrencesof motifs, and finding communities in this graph. We showthat this innovative approach leads to a method that is as easy to useas a probabilistic motif finder, and as sensitive to low quorum motifsas a combinatorial motif finder. The method was integrated with twocombinatorial motif finders, and made available on the Web.

[1]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[2]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[3]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[4]  G. K. Sandve,et al.  A survey of motif discovery methods in an integrated framework , 2006, Biology Direct.

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Panayiotis V. Benos,et al.  DNA Familial Binding Profiles Made Easy: Comparison of Various Motif Alignment and Clustering Strategies , 2007, PLoS Comput. Biol..

[7]  Marie-France Sagot,et al.  An efficient algorithm for the identification of structured motifs in DNA promoter sequences , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  P. Blaiseau,et al.  Aft2p, a Novel Iron-regulated Transcription Activator That Modulates, with Aft1p, Intracellular Iron Use and Resistance to Oxidative Stress in Yeast* , 2001, The Journal of Biological Chemistry.

[9]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[10]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  A. Goffeau,et al.  Genome microarray analysis of transcriptional activation in multidrug resistance yeast mutants , 2000, FEBS letters.

[12]  Jean-Michel Camadro,et al.  Direct Activation of Genes Involved in Intracellular Iron Use by the Yeast Iron-Responsive Transcription Factor Aft2 without Its Paralog Aft1 , 2005, Molecular and Cellular Biology.

[13]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Arlindo L. Oliveira,et al.  Bioinformatics Original Paper Musa: a Parameter Free Algorithm for the Identification of Biologically Significant Motifs , 2022 .

[15]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[16]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Ari Löytynoja,et al.  MATLIGN: a motif clustering, comparison and matching tool , 2007, BMC Bioinformatics.

[18]  Miguel C. Teixeira,et al.  Early transcriptional response of Saccharomyces cerevisiae to stress imposed by the herbicide 2,4-dichlorophenoxyacetic acid. , 2006, FEMS yeast research.

[19]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[20]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[21]  D. Winge,et al.  Aft1p and Aft2p Mediate Iron-responsive Gene Expression in Yeast through Related Promoter Elements* , 2003, Journal of Biological Chemistry.

[22]  G. Church,et al.  Discrimination between paralogs using microarray analysis: application to the Yap1p and Yap2p transcriptional networks. , 2002, Molecular biology of the cell.

[23]  Roded Sharan,et al.  A Discriminative Model for Identifying Spatial cis-Regulatory Modules , 2005, J. Comput. Biol..

[24]  Marie-France Sagot,et al.  Spelling Approximate Repeated or Common Motifs Using a Suffix Tree , 1998, LATIN.

[25]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[26]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[27]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[28]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.