Probabilistic alignment of motifs with sequences

MOTIVATION Motif detection is an important component of the classification and annotation of protein sequences. A method for aligning motifs with an amino acid sequence is introduced. The motifs can be described by the secondary (i.e. functional, biophysical, etc.) characteristics of a signal or pattern to be detected. The results produced are based on the statistical relevance of the alignment. The method was targeted to avoid the problems (i.e. over-fitting, biological interpretation and mathematical soundness) encountered in other methods currently available. RESULTS The method was tested on lipoprotein signals in B. subtilis yielding stable results. The results of signal prediction were consistent with other methods where literature was available. AVAILABILITY An implementation of the motif alignment, refining and bootstrapping is available for public use online at http://www.expasy.org/tools/patoseq/

[1]  A Danchin,et al.  SubtiList: a relational database for the Bacillus subtilis genome. , 1995, Microbiology.

[2]  P. Bork,et al.  Prediction of potential GPI-modification sites in proprotein sequences. , 1999, Journal of molecular biology.

[3]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[4]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[5]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[6]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[7]  S. Bron,et al.  Signal Peptide-Dependent Protein Transport inBacillus subtilis: a Genome-Based Survey of the Secretome , 2000, Microbiology and Molecular Biology Reviews.

[8]  John C. Wootton,et al.  Evaluating the Effectiveness of Sequence Analysis Algorithms Using Measures of Relevant Information , 1997, Comput. Chem..

[9]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[10]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[11]  S Karlin,et al.  Association of charge clusters with functional domains of cellular transcription factors. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[13]  Michael I. Jordan Why the logistic function? A tutorial discussion on probabilities and neural networks , 1995 .

[14]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[15]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[16]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[17]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[18]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[19]  Satoru Miyano,et al.  Views: Fundamental Building Blocks in the Process of Knowledge Discovery , 2001, FLAIRS.