Reconstructing history with amino acid sequences 1

The main goal of the protein evolutionist is the reconstruction of past events leading to the structures of contemporary proteins. The common strategy is to align amino acid sequences and make inferences about matters of common ancestry. The rate of change of amino acid sequence varies greatly from protein to protein, and this naturally affects how far back a given protein's ancestry can be traced. Happily, the rate of change of many proteins is slow enough that very ancient events can be inferred. Many mainstream metabolic enzymes, for example, are 40–50% identical in prokaryotes and eukaryotes, groups that diverged from a common ancestor more than 1.5 billion years ago. Moreover, some eukaryotic proteins like actin and tubulin change so slowly that they are seldom less than 60% identical, no matter from what source they are drawn. As it happens, prokaryotic counterparts for many eukaryotic cytoskeletal proteins are unknown. A recent exception involves the finding that a heat shock protein cognate is a relative of actin. The gene duplication that gave rise to these two proteins must have been an ancient event. The more recent invention of other proteins whose distribution is restricted to one or the other of the major kingdoms may be easier to trace. Among the factors that can confound the reconstruction of events, however, are occasional horizontal gene transfers and exon shuffling. The latter has led to a number of mosaic proteins, many of which contain various combinations of a relatively small set of modules like the epidermal growth factor domain.

[1]  J. Ferretti,et al.  Sequence analysis and expression in Escherichia coli of the hyaluronidase gene of Streptococcus pyogenes bacteriophage H4489A , 1989, Infection and immunity.

[2]  J. R. Brown Structural origins of mammalian albumin. , 1976, Federation proceedings.

[3]  T. Watanabe,et al.  Gene cloning of chitinase A1 from Bacillus circulans WL-12 revealed its evolutionary relationship to Serratia chitinase and to the type III homology units of fibronectin. , 1990, The Journal of biological chemistry.

[4]  W. Kabsch,et al.  Atomic structure of the actin: DNase I complex , 1990, Nature.

[5]  M. Becker-Hapak,et al.  Purification and properties of an intracellular calmodulinlike protein from Bacillus subtilis cells , 1991, Journal of bacteriology.

[6]  S. Osawa,et al.  Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[7]  H. Itano The Human Hemoglobins: Their Properties and Genetic Control , 1957 .

[8]  R. Doolittle,et al.  Evolution and relatedness in two aminoacyl-tRNA synthetase families. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. Waterston,et al.  Sequence of an unusually large protein implicated in regulation of myosin activity in C. elegans , 1989, Nature.

[10]  J. Dixon,et al.  Protein tyrosine phosphatase activity of an essential virulence determinant in Yersinia. , 1990, Science.

[11]  R. Doolittle,et al.  Characterization, primary structure, and evolution of lamprey plasma albumin , 1992, Protein science : a publication of the Protein Society.

[12]  L. Patthy,et al.  Intron‐dependent evolution: Preferred types of exons and introns , 1987, FEBS letters.

[13]  J. Exposito,et al.  Characterization of a fibrillar collagen gene in sponges reveals the early evolutionary appearance of two collagen gene families. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[14]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[15]  S. Cusack,et al.  A second class of synthetase structure revealed by X-ray analysis of Escherichia coli seryl-tRNA synthetase at 2.5 Å , 1990, Nature.

[16]  T. Südhof,et al.  Cassette of eight exons shared by genes for LDL receptor and EGF precursor. , 1985, Science.

[17]  A. Holmgren,et al.  Crystal structure of chaperone protein PapD reveals an immunoglobulin fold , 1989, Nature.

[18]  M. A. McClure,et al.  Origins and Evolutionary Relationships of Retroviruses , 1989, The Quarterly Review of Biology.

[19]  W. Gilbert Why genes in pieces? , 1978, Nature.

[20]  Russell F. Doolittle,et al.  The genealogy of some recently evolved vertebrate proteins , 1985 .

[21]  S. Wakabayashi,et al.  Primary sequence of a dimeric bacterial haemoglobin from Vitreoscilla , 1986, Nature.

[22]  László Patthy,et al.  Modular exchange principles in proteins , 1991 .

[23]  L. Delbaere,et al.  Amino acid sequence alignment of bacterial and mammalian pancreatic serine proteases based on topological equivalences. , 1978, Canadian journal of biochemistry.

[24]  D. Bamford,et al.  Capsomer proteins of bacteriophage PRD1, a bacterial virus with a membrane. , 1990, Virology.

[25]  B. Oakley,et al.  Identification of γ-tubulin, a new member of the tubulin superfamily encoded by mipA gene of Aspergillus nidulans , 1989, Nature.

[26]  W. Kabsch,et al.  Similarity of the three-dimensional structures of actin and the ATPase fragment of a 70-kDa heat shock cognate protein. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[27]  R F Doolittle,et al.  Relationships of human protein sequences to those of other organisms. , 1986, Cold Spring Harbor symposia on quantitative biology.

[28]  J. Coligan,et al.  A vaccine candidate from the sexual stage of human malaria that contains EGF-like domains , 1988, Nature.