PatMatch: a program for finding patterns in peptide and nucleotide sequences

Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497–498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265–1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at . The PatMatch server is available on the web at for searching Arabidopsis thaliana sequences.

[1]  E. Stockinger,et al.  Arabidopsis thaliana CBF1 encodes an AP2 domain-containing transcriptional activator that binds to the C-repeat/DRE, a cis-acting DNA regulatory element that stimulates transcription in response to low temperature and water deficit. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[2]  C. Ball,et al.  Genetic and physical maps of Saccharomyces cerevisiae. , 1997, Nature.

[3]  R. Overbeek,et al.  Searching for patterns in genomic data. , 1997, Trends in genetics : TIG.

[4]  Gonzalo Navarro,et al.  A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching , 1998, CPM.

[5]  Graziano Pesole,et al.  PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance , 2000, Bioinform..

[6]  Wen Huang,et al.  The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant , 2001, Nucleic Acids Res..

[7]  Gonzalo Navarro,et al.  NR‐grep: a fast and flexible pattern‐matching tool , 2001, Softw. Pract. Exp..

[8]  Douglas L. Brutlag,et al.  The EMOTIF database , 2001, Nucleic Acids Res..

[9]  Jungwon Yoon,et al.  The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community , 2003, Nucleic Acids Res..

[10]  Graziano Pesole,et al.  PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences , 2003, Nucleic Acids Res..