Species-specific analysis of protein sequence motifs using mutual information

BackgroundProtein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. Such profiles conveniently reflect functional necessities by pointing out proximity at conserved sequence positions as well as depicting distances at variable positions. Discovering significant conservation characteristics within the variable positions of profiles mirrors group-specific and, in particular, evolutionary features of the underlying sequences.ResultsWe describe the tool PROfile analysis based onMutualInformation (PROMI) that enables comparative analysis of user-classified protein sequences. PROMI is implemented as a web service using Perl and R as well as other publicly available packages and tools on the server-side. On the client-side platform-independence is achieved by generally applied internet delivery standards. As one possible application analysis of the zinc finger C2H2-type protein domain is introduced to illustrate the functionality of the tool.ConclusionThe web service PROMI should assist researchers to detect evolutionary correlations in protein profiles of defined biological sequences. It is available at http://promi.mpimp-golm.mpg.de where additional documentation can be found.

[1]  Joachim Selbig,et al.  Scoring and identifying organism-specific functional patterns and putative phosphorylation sites in protein sequences using mutual information. , 2003, Biochemical and biophysical research communications.

[2]  L. Mirny,et al.  Using orthologous and paralogous proteins to identify specificity determining residues. , 2002, Genome biology.

[3]  Alexander Kraskov,et al.  Hierarchical Clustering Based on Mutual Information , 2003, ArXiv.

[4]  Scot A. Wolfe,et al.  DNA RECOGNITION BY Cys 2 His 2 ZINC FINGER PROTEINS , 2000 .

[5]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[6]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[7]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[8]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[9]  Judith Klein-Seetharaman,et al.  Identification of fundamental building blocks in protein sequences using statistical association measures , 2004, SAC '04.

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  C. Pabo,et al.  DNA recognition by Cys2His2 zinc finger proteins. , 2000, Annual review of biophysics and biomolecular structure.

[12]  R. Hartley Transmission of information , 1928 .

[13]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[14]  Thomas Lengauer,et al.  Diversity and complexity of HIV-1 drug resistance: A bioinformatics approach to predicting phenotype from genotype , 2002, Proceedings of the National Academy of Sciences of the United States of America.