PROMPT: a protein mapping and comparison tool

BackgroundComparison of large protein datasets has become a standard task in bioinformatics. Typically researchers wish to know whether one group of proteins is significantly enriched in certain annotation attributes or sequence properties compared to another group, and whether this enrichment is statistically significant. In order to conduct such comparisons it is often required to integrate molecular sequence data and experimental information from disparate incompatible sources. While many specialized programs exist for comparisons of this kind in individual problem domains, such as expression data analysis, no generic software solution capable of addressing a wide spectrum of routine tasks in comparative proteomics is currently available.ResultsPROMPT is a comprehensive bioinformatics software environment which enables the user to compare arbitrary protein sequence sets, revealing statistically significant differences in their annotation features. It allows automatic retrieval and integration of data from a multitude of molecular biological databases as well as from a custom XML format. Similarity-based mapping of sequence IDs makes it possible to link experimental information obtained from different sources despite discrepancies in gene identifiers and minor sequence variation. PROMPT provides a full set of statistical procedures to address the following four use cases: i) comparison of the frequencies of categorical annotations between two sets, ii) enrichment of nominal features in one set with respect to another one, iii) comparison of numeric distributions, and iv) correlation of numeric variables. Analysis results can be visualized in the form of plots and spreadsheets and exported in various formats, including Microsoft Excel.ConclusionPROMPT is a versatile, platform-independent, easily expandable, stand-alone application designed to be a practical workhorse in analysing and mining protein sequences and associated annotation. The availability of the Java Application Programming Interface and scripting capabilities on one hand, and the intuitive Graphical User Interface with context-sensitive help system on the other, make it equally accessible to professional bioinformaticians and biologically-oriented users. PROMPT is freely available for academic users from http://webclu.bio.wzw.tum.de/prompt/.

[1]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[2]  Dmitrij Frishman,et al.  Functional and structural genomics using PEDANT , 2001, Bioinform..

[3]  S F Altschul,et al.  BRCA1 protein products ... Functional motifs... , 1996, Nature genetics.

[4]  Massimo Di Giulio,et al.  A comparison of proteins from Pyrococcus furiosus and Pyrococcus abyssi: barophily in the physicochemical properties of amino acids and in the genetic code. , 2005, Gene.

[5]  E. Koonin,et al.  Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. , 2002, Genome research.

[6]  Stefano Pascarella,et al.  Comparative structural analysis of psychrophilic and meso‐ and thermophilic enzymes , 2002, Proteins.

[7]  D Eisenberg,et al.  Transproteomic evidence of a loop-deletion mechanism for enhancing protein thermostability. , 1999, Journal of molecular biology.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Daniel L. Hartl,et al.  GeneMerge - Post-genomic Analysis, Data Mining, and Hypothesis Testing , 2003, Bioinform..

[10]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[11]  M Gerstein,et al.  A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. , 1997, Journal of molecular biology.

[12]  D. J. Naylor,et al.  Proteome-wide Analysis of Chaperonin-Dependent Protein Folding in Escherichia coli , 2005, Cell.

[13]  H. Mewes,et al.  Protein structural classes in five complete genomes , 1997, Nature Structural Biology.

[14]  L. Hood,et al.  Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. , 2001, Genome research.

[15]  Thomas Rattei,et al.  SIMAP: the similarity matrix of proteins , 2006, Nucleic Acids Res..

[16]  Dmitrij Frishman,et al.  The PEDANT genome database , 2003, Nucleic Acids Res..

[17]  Dan Wu,et al.  EMBL Nucleotide Sequence Database: developments in 2005 , 2005, Nucleic Acids Res..

[18]  Owen White,et al.  Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics , 2005, Bioinform..

[19]  Dmitrij Frishman,et al.  Will my protein crystallize? A sequence‐based predictor , 2005, Proteins.

[20]  Gary D. Bader,et al.  SeqHound: biological sequence and structure database as a platform for bioinformatics research , 2002, BMC Bioinformatics.

[21]  Ramil N. Nurtdinov,et al.  Alternative splicing and protein function , 2005, BMC Bioinformatics.

[22]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[23]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[24]  Michael R. Green,et al.  Dissecting the Regulatory Circuitry of a Eukaryotic Genome , 1998, Cell.

[25]  Eugene V. Koonin,et al.  …Functional motifs… , 1996, Nature Genetics.

[26]  Massimo Di Giulio,et al.  A comparison of proteins from Pyrococcus furiosus and Pyrococcus abyssi: barophily in the physicochemical properties of amino acids and in the genetic code. , 2005 .

[27]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[28]  M. Gerstein,et al.  The stability of thermophilic proteins: a study based on comprehensive genome comparison , 2000, Functional & Integrative Genomics.

[29]  Dmitrij Frishman,et al.  The PEDANT genome database in 2005 , 2004, Nucleic Acids Res..

[30]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[31]  Dmitrij Frishman,et al.  Conservation of protein-protein interactions - lessons from ascomycota. , 2004, Trends in genetics : TIG.

[32]  Dmitrij Frishman,et al.  Designability, aggregation propensity and duplication of disease-associated proteins. , 2005, Protein engineering, design & selection : PEDS.

[33]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..