Mathematical Tools for Regulatory Signals Extraction

Statistical techniques provide an efficient way to analyze “in silico” the huge amount of data from large-scale sequencing. The key idea is to search for regulatory signals among exceptional words, e.g. words that are either underrepresented or overrepresented. We provide a few mathematical results to.assess the significance of an exceptional word.

[1]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[2]  Mireille Régnier,et al.  On Pattern Frequency Occurrences in a Markovian Sequence , 1998, Algorithmica.

[3]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[4]  A Danchin,et al.  Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. , 1998, Nucleic acids research.

[5]  Ajay K. Royyuru,et al.  Systematic and automated discovery of patterns in PROSITE families , 2000, RECOMB '00.

[6]  Edward A. Bender,et al.  The Distribution of Subword Counts is Usually Normal , 1993, Eur. J. Comb..

[7]  Gesine Reinert,et al.  Probabilistic and Statistical Properties of Words: An Overview , 2000, J. Comput. Biol..

[8]  Mireille Régnier,et al.  Assessing the Statistical Significance of Overrepresented Oligonucleotides , 2001, WABI.

[9]  Pierre Nicodème,et al.  Fast Approximate Motif Statistics , 2001, J. Comput. Biol..

[10]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[11]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[12]  Jeremy Buhler,et al.  Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[13]  Stefano Lonardi,et al.  Global detectors of unusual words: design, implementation, and applications to pattern discovery in biosequences , 2001 .

[14]  Leonidas J. Guibas,et al.  String Overlaps, Pattern Matching, and Nontransitive Games , 1981, J. Comb. Theory A.

[15]  Pierre Baldi,et al.  Distribution patterns of over-represented k-mers in non-coding yeast DNA , 2002, Bioinform..

[16]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[17]  Peter L. Hammer,et al.  Discrete Applied Mathematics , 1993 .

[18]  Marie-France Sagot,et al.  Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification , 2000, RECOMB '00.

[19]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[20]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[21]  Stefano Lonardi,et al.  Efficient Detection of Unusual Words , 2000, J. Comput. Biol..

[22]  Laurent Marsan Inférence de motifs structurés : algorithmes et outils appliqués à la détection de sites de fixation dans le séquences génomiques , 2002 .

[23]  Jean-Jacques Daudin,et al.  Exact distribution of word occurrences in a random sequence of letters , 1999, Journal of Applied Probability.

[24]  Stéphane Robin,et al.  Numerical Comparison of Several Approximations of the Word Count Distribution in Random Sequences , 2002, J. Comput. Biol..

[25]  Maude Pupin,et al.  Detecting Localized Repeats in Genomic Sequences: A New Strategy and Its Application to Bacillus Subtilis and Arabidopsis Thaliana Sequences , 2000, Comput. Chem..

[26]  Mathieu Blanchette,et al.  Separating real motifs from their artifacts , 2001, ISMB.

[27]  Sartaj Sahni,et al.  Analysis of algorithms , 2000, Random Struct. Algorithms.

[28]  Vsevolod J. Makeev,et al.  Analysis of bacterial RM-Systems through genome-scale analysis and related taxonomic issues , 2003, Silico Biol..

[29]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.