CDvist: a webserver for identification and visualization of conserved domains in protein sequences

SUMMARY Identification of domains in protein sequences allows their assigning to biological functions. Several webservers exist for identification of protein domains using similarity searches against various databases of protein domain models. However, none of them provides comprehensive domain coverage while allowing bulk querying and their visualization schemes can be improved. To address these issues, we developed CDvist (a comprehensive domain visualization tool), which combines the best available search algorithms and databases into a user-friendly framework. First, a given protein sequence is matched to domain models using high-specificity tools and only then unmatched segments are subjected to more sensitive algorithms resulting in a best possible comprehensive coverage. Bulk querying and rich visualization and download options provide improved functionality to domain architecture analysis. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://cdvist.utk.edu CONTACT oadebali@vols.utk.edu or ijouline@utk.edu.

[1]  A. Lupas Prediction and analysis of coiled-coil structures. , 1996, Methods in enzymology.

[2]  Johannes Söding,et al.  Fast and accurate automatic structure prediction with HHpred , 2009, Proteins.

[3]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[4]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[5]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[6]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[7]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[8]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[9]  Narmada Thanki,et al.  CDD: conserved domains and protein three-dimensional structure , 2012, Nucleic Acids Res..

[10]  P. Bork,et al.  Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences , 2001, Genome Biology.

[11]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[12]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[13]  Bhanu Rekapalli,et al.  Dynamics of domain coverage of the protein sequence universe , 2012, BMC Genomics.

[14]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[15]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[16]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..