SciDBMaker: new software for computer-aided design of specialized biological databases

BackgroundThe exponential growth of research in molecular biology has brought concomitant proliferation of databases for stocking its findings. A variety of protein sequence databases exist. While all of these strive for completeness, the range of user interests is often beyond their scope. Large databases covering a broad range of domains tend to offer less detailed information than smaller, more specialized resources, often creating a need to combine data from many sources in order to obtain a complete picture. Scientific researchers are continually developing new specific databases to enhance their understanding of biological processes.DescriptionIn this article, we present the implementation of a new tool for protein data analysis. With its easy-to-use user interface, this software provides the opportunity to build more specialized protein databases from a universal protein sequence database such as Swiss-Prot. A family of proteins known as bacteriocins is analyzed as 'proof of concept'.ConclusionSciDBMaker is stand-alone software that allows the extraction of protein data from the Swiss-Prot database, sequence analysis comprising physicochemical profile calculations, homologous sequences search, multiple sequence alignments and the building of new and more specialized databases. It compiles information with relative ease, updates and compares various data relevant to a given protein family and could solve the problem of dispersed biological search results.

[1]  A Ikai,et al.  Thermostability and aliphatic index of globular proteins. , 1980, Journal of biochemistry.

[2]  Riadh Hammami,et al.  BACTIBASE: a new web-accessible database for bacteriocin characterization , 2007, BMC Microbiology.

[3]  Richard Wolfenden,et al.  Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution , 1988 .

[4]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[5]  M. W. Pandit,et al.  Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. , 1990, Protein engineering.

[6]  Roderic D. M. Page,et al.  TreeView: an application to display phylogenetic trees on personal computers , 1996, Comput. Appl. Biosci..

[7]  C. S. Patrickios,et al.  Polypeptide Amino Acid Composition and Isoelectric Point , 1995 .

[8]  C S Patrickios,et al.  Polypeptide amino acid composition and isoelectric point. II. Comparison between experiment and theory. , 1995, Analytical biochemistry.

[9]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[10]  J. Tobias,et al.  Universality and structure of the N-end rule. , 1989, The Journal of biological chemistry.

[11]  J. Celis,et al.  Reference points for comparisons of two‐dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions , 1994, Electrophoresis.

[12]  Rodrigo Lopez,et al.  Web Services at the European Bioinformatics Institute , 2007, Nucleic Acids Res..

[13]  A. Varshavsky,et al.  In vivo half-life of a protein is a function of its amino-terminal residue. , 1986, Science.

[14]  C. R. Middaugh,et al.  Statistical determination of the average values of the extinction coefficients of tryptophan and tyrosine in native proteins. , 1992, Analytical biochemistry.

[15]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[16]  R. Apweiler Protein sequence databases. , 2000, Advances in protein chemistry.