BackgroundThe rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available.ResultsThe tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes.ConclusionsThis software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.
[1]
D. Eisenberg,et al.
Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.
,
1999,
Proceedings of the National Academy of Sciences of the United States of America.
[2]
D. Lipman,et al.
A genomic perspective on protein families.
,
1997,
Science.
[3]
M. Sutcliffe,et al.
X-ray Scattering Studies of Methylophilus methylotrophus (sp. W3A1) Electron-transferring Flavoprotein
,
2000,
The Journal of Biological Chemistry.
[4]
E. Koonin,et al.
Potential genomic determinants of hyperthermophily.
,
2003,
Trends in genetics : TIG.
[5]
Michael Y. Galperin,et al.
Who's your neighbor? New computational approaches for functional genomics
,
2000,
Nature Biotechnology.
[6]
Patrick Forterre,et al.
A hot story from comparative genomics: reverse gyrase is the only hyperthermophile-specific protein.
,
2002,
Trends in genetics : TIG.
[7]
Darren A. Natale,et al.
The COG database: an updated version includes eukaryotes
,
2003,
BMC Bioinformatics.
[8]
Michael Kaufmann,et al.
Thermophile-specific proteins: the gene product of aq_1292 from Aquifex aeolicus is an NTPase
,
2003,
BMC Biochemistry.
[9]
Michael Kaufmann,et al.
EPPS: mining the COG database by an extended phylogenetic patterns search
,
2003,
Bioinform..
[10]
Michael Y. Galperin,et al.
The COG database: new developments in phylogenetic classification of proteins from complete genomes
,
2001,
Nucleic Acids Res..