The functional composition of organisms can be analysed for the first time with the appearance of complete or sizeable parts of various genomes. We have reduced the problem of protein function classification to a simple scheme with three classes of protein function: energy‐, information‐ and communication‐associated proteins. Finer classification schemes can be easily mapped to the above three classes. To DAal with the vast amount of information, a system for automatic function classification using database annotations has been DAveloped. The system is able to classify correctly about 80% of the query sequences with annotations. Using this system, we can analyse samples from the genomes of the most represented species in sequence databases and compare their genomic composition. The similarities and differences for different taxonomic groups are strikingly intuitive. Viruses have the highest proportion of proteins involved in the control and expression of genetic information. Bacteria have the highest proportion of their genes DAdicated to the production of proteins associated with small molecule transformations and transport. Animals have a very large proportion of proteins associated with intra‐ and intercellular communication and other regulatory processes. In general, the proportion of communication‐related proteins increases during evolution, indicating trends that led to the emergence of the eukaryotic cell and later the transition from unicellular to multicellular organisms.
[1]
U. Hobohm,et al.
A sequence property approach to searching protein databases.
,
1995,
Journal of molecular biology.
[2]
J. Craig Venter,et al.
3,400 new expressed sequence tags identify diversity of transcripts in human brain
,
1993,
Nature Genetics.
[3]
R. Fleischmann,et al.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
,
1995,
Science.
[4]
A. Bairoch,et al.
The SWISS-PROT protein sequence data bank: current status.
,
1994,
Nucleic acids research.
[5]
R. Fleischmann,et al.
The Minimal Gene Complement of Mycoplasma genitalium
,
1995,
Science.
[6]
C. Sander,et al.
Challenging times for bioinformatics
,
1995,
Nature.