Comparative genome analysis to reveal protein evolution.

The completion of a substantial number of complete genome sequencing initiatives has produced more than a million protein sequences. Analysis of these protein sequences is possible using recent advances in computing and bioinformatics techniques. This thesis describes a novel automated protein classification protocol which groups proteins into families and identifies protein domain architectures via domain assignment. This data is presented in the Gene3D database which is used for subsequent analysis. The analysis of the distribution of protein family and protein domain data shows a power-law like distribution that is typically seen in many biological data distributions and is indicative of the small world networks that underlie biological systems biology. Kingdom distribution of superfamilies and protein families in Gene3D has been used to describe the evolutionary mechanisms that determine genome diversity through protein diversity. Domain occurrence profiles have been used to identify protein domain superfamilies that are correlated with genome size in bacteria. These superfamilies are shown to exhibit a balance between metabolic and regulatory roles along microeconomic principles that may determine bacterial genome size. Domain families identified in Gene3D enable a determination of the total number of protein folds in nature. Sub-clustering of domain families permits domain family sub-cluster occurrence profiles to be determined. These profiles are shown to be capable of detecting correlations and anti-correlations between domain families that are undetectable using superfamily occurrence profiles alone. Clusters of correlated domain subclusters are shown to identify functionally linked clusters of proteins. Finally, the data in Gene3D is used to functionally annotate the CATH database and provide functional predictions for un-annotated proteins, providing more comprehensive functional repertoire and greater accuracy than other functional prediction methods.

[1]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[2]  M. Wilkinson,et al.  Regulation of the Rhox5 Homeobox Gene in Primary Granulosa Cells: Preovulatory Expression and Dependence on SP1/SP3 and GABP1 , 2005, Biology of reproduction.

[3]  M. Romero Molecular pathophysiology of SLC4 bicarbonate transporters , 2005, Current opinion in nephrology and hypertension.

[4]  Ian Sillitoe,et al.  Assessing strategies for improved superfamily recognition , 2005, Protein science : a publication of the Protein Society.

[5]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[6]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[7]  Christos A. Ouzounis,et al.  The properties of protein family space depend on experimental design , 2005, Bioinform..

[8]  Russell L. Marsden,et al.  Progress of structural genomics initiatives: an analysis of solved target structures. , 2005, Journal of molecular biology.

[9]  Ken W. Y. Cho,et al.  Global gene expression profiling and cluster analysis in Xenopus laevis , 2005, Mechanisms of Development.

[10]  S. Teichmann,et al.  The relationship between domain duplication and recombination. , 2005, Journal of molecular biology.

[11]  Frances M. G. Pearl,et al.  The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis , 2004, Nucleic Acids Res..

[12]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[13]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[14]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[15]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[16]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[17]  Ori Sasson,et al.  ProtoNet 4.0: A hierarchical classification of one million protein sequences , 2004, Nucleic Acids Res..

[18]  Qing Zhang,et al.  The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema , 2004, Nucleic Acids Res..

[19]  S. Brenner,et al.  Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches , 2004, Proteins.

[20]  Leon Goldovsky,et al.  BioLayout(Java): versatile network visualisation of structural and functional relationships. , 2005, Applied bioinformatics.

[21]  R. Price,et al.  c-Jun N-terminal kinase contributes to aberrant retinoid signaling in lung cancer cells by phosphorylating and inducing proteasomal degradation of retinoic acid receptor alpha. , 2005, Molecular and cellular biology.

[22]  S. Wuchty,et al.  Evolutionary cores of domain co-occurrence networks , 2005, BMC Evolutionary Biology.

[23]  Michal Linial,et al.  Automatic detection of false annotations via binary property clustering , 2005, BMC Bioinformatics.

[24]  D. Frishman,et al.  A domain interaction map based on phylogenetic profiling. , 2004, Journal of molecular biology.

[25]  R. Sears The Life Cycle of C-Myc: From Synthesis to Degradation , 2004, Cell cycle.

[26]  Burkhard Rost,et al.  CHOP proteins into structural domain‐like fragments , 2004, Proteins.

[27]  David A. Lee,et al.  Progress towards mapping the universe of protein folds , 2004, Genome Biology.

[28]  R. Fraser The structure of deoxyribose nucleic acid. , 2004, Journal of structural biology.

[29]  Cathy H. Wu,et al.  Update on genome completion and annotations: Protein Information Resource , 2004, Human Genomics.

[30]  S. Teichmann,et al.  Supra-domains: evolutionary units larger than single protein domains. , 2004, Journal of molecular biology.

[31]  B. Neel,et al.  Shp2 regulates SRC family kinase activity and Ras/Erk activation by controlling Csk recruitment. , 2004, Molecular cell.

[32]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[33]  Zhang-Zhi Hu,et al.  The iProClass integrated database for protein functional analysis , 2004, Comput. Biol. Chem..

[34]  Michael J. E. Sternberg,et al.  3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes , 2004, Nucleic Acids Res..

[35]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[36]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[37]  Liam J. McGuffin,et al.  The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms , 2004, Nucleic Acids Res..

[38]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..

[39]  Rolf Apweiler,et al.  Filtering erroneous protein annotation , 2004, ISMB/ECCB.

[40]  Steven J. M. Jones,et al.  Structural characterization of genomes by large scale sequence-structure threading , 2004, BMC Bioinformatics.

[41]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[42]  Edward M Marcotte,et al.  Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages , 2003, Nature Biotechnology.

[43]  J. Söding,et al.  More than the sum of their parts: On the evolution of proteins from peptides , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[44]  C. Kurland,et al.  Horizontal gene transfer: A critical view , 2003 .

[45]  Anton J. Enright,et al.  Protein families and TRIBES in genome sequence space. , 2003, Nucleic acids research.

[46]  C. Chothia,et al.  Evolution of the Protein Repertoire , 2003, Science.

[47]  I. Arkin,et al.  Monte Carlo estimation of the number of possible protein folds: Effects of sampling bias and folds distributions , 2003, Proteins.

[48]  L. Holm,et al.  Exhaustive enumeration of protein domain families. , 2003, Journal of molecular biology.

[49]  Liam J. McGuffin,et al.  Improvement of the GenTHREADER Method for Genomic Fold Recognition , 2003, Bioinform..

[50]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[51]  S. Teichmann,et al.  Evolution of transcription factors and the gene regulatory network in Escherichia coli. , 2003, Nucleic acids research.

[52]  Burkhard Rost,et al.  Domains, motifs and clusters in the protein universe. , 2003, Current opinion in chemical biology.

[53]  Anton J. Enright,et al.  Myriads of protein families, and still counting , 2003, Genome Biology.

[54]  Martin Vingron,et al.  The SYSTERS protein family database: Taxon-related protein family size distributions and singleton frequencies , 2003, German Conference on Bioinformatics.

[55]  Lisa N Kinch,et al.  CASP5 assessment of fold recognition target predictions , 2003, Proteins.

[56]  James E. Bray,et al.  Gene3D: structural assignments for the biologist and bioinformaticist alike , 2003, Nucleic Acids Res..

[57]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[58]  Cathy H. Wu,et al.  iProClass: an integrated database of protein family, function and structure information , 2003, Nucleic Acids Res..

[59]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[60]  Ori Sasson,et al.  ProtoNet: hierarchical classification of the protein space , 2003, Nucleic Acids Res..

[61]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[62]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[63]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[64]  Richard M. Simon,et al.  Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data , 2002, Bioinform..

[65]  M. Madera,et al.  A comparison of profile hidden Markov model procedures for remote homology detection. , 2002, Nucleic acids research.

[66]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[67]  C. Orengo,et al.  One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions. , 2002, Journal of molecular biology.

[68]  M. Gerstein,et al.  The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties , 2002, Genome Biology.

[69]  Burkhard Rost,et al.  Target space for structural genomics revisited , 2002, Bioinform..

[70]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[71]  M. Gerstein,et al.  Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. , 2002, Journal of molecular biology.

[72]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[73]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[74]  N. Moran,et al.  Microbial Minimalism Genome Reduction in Bacterial Pathogens , 2002, Cell.

[75]  Frances M. G. Pearl,et al.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database. , 2002, Genome research.

[76]  Frances M. G. Pearl,et al.  The CATH extended protein‐family database: Providing structural annotations for genome sequences , 2002, Protein science : a publication of the Protein Society.

[77]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[78]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[79]  H. Herzel,et al.  Is there a bias in proteome research? , 2001, Genome research.

[80]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[81]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[82]  J Hacker,et al.  Whole genome plasticity in pathogenic bacteria. , 2001, Current opinion in microbiology.

[83]  N. Moran,et al.  Deletional bias and the evolution of bacterial genomes. , 2001, Trends in genetics : TIG.

[84]  William Stafford Noble,et al.  Analysis of strain and regional variation in gene expression in mouse brain , 2001, Genome Biology.

[85]  H. Mewes,et al.  SNAPping up functionally related genes based on context information: a colinearity-free approach. , 2001, Journal of molecular biology.

[86]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[87]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[88]  Peter D. Karp,et al.  Database verification studies of SWISS-PROT and GenBank , 2001, Bioinform..

[89]  L Rychlewski,et al.  Fold predictions for bacterial genomes. , 2001, Journal of structural biology.

[90]  Frances M. G. Pearl,et al.  Review: what can structural classifications reveal about protein evolution? , 2001, Journal of structural biology.

[91]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[92]  E V Koonin,et al.  Lineage-specific gene expansions in bacterial and archaeal genomes. , 2001, Genome research.

[93]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..

[94]  Michael Lappe,et al.  A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3 , 2001, Nucleic Acids Res..

[95]  M. Gerstein,et al.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. , 2001, Genome research.

[96]  Stephen K. Burley,et al.  Crystal structures of ribosome anti-association factor IF6 , 2000, Nature Structural Biology.

[97]  V. Davisson,et al.  NMR structure of a DNA duplex containing nucleoside analog 1-(2'-deoxy-beta-D-ribofuranosyl)-3-nitropyrrole and the structure of the unmodified control. , 2000, Nucleic acids research.

[98]  P. Bork,et al.  Homology among (betaalpha)(8) barrels: implications for the evolution of metabolic pathways. , 2000, Journal of molecular biology.

[99]  C. Pannecouque,et al.  1H-13C nuclear magnetic resonance assignment and structural characterization of HIV-1 Tat protein. , 2000, Comptes rendus de l'Academie des sciences. Serie III, Sciences de la vie.

[100]  J. McPherson,et al.  The syntenic relationship of the zebrafish and human genomes. , 2000, Genome research.

[101]  Golan Yona,et al.  Towards a Complete Map of the Protein Space Based on a Unified Sequence and Structure Analysis of All Known Proteins , 2000, ISMB.

[102]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[103]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[104]  S. Dongen Graph clustering by flow simulation , 2000 .

[105]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[106]  Anton J. Enright,et al.  GeneRAGE: a robust algorithm for sequence clustering and domain detection , 2000, Bioinform..

[107]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[108]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[109]  Frances M. G. Pearl,et al.  The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. , 2000, Protein engineering.

[110]  H. Allen Orr ADAPTATION AND THE COST OF COMPLEXITY , 2000 .

[111]  C. Orengo CORA—Topological fingerprints for protein structural families , 2008, Protein science : a publication of the Protein Society.

[112]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[113]  Arvind K. Bansal,et al.  An automated comparative analysis of 17 complete microbial genomes , 1999, Bioinform..

[114]  L. Patthy Genome evolution and the evolution of exon-shuffling--a review. , 1999, Gene.

[115]  S. Andersson,et al.  Obligate intracellular parasites: Rickettsia prowazekii and Chlamydia trachomatis , 1999, FEBS letters.

[116]  Arne Elofsson,et al.  A comparison of sequence and structure protein domain families as a basis for structural genomics , 1999, Bioinform..

[117]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[118]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[119]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[120]  R. DePinho,et al.  The oncogene and Polycomb-group gene bmi-1 regulates cell proliferation and senescence through the ink4a locus , 1999, Nature.

[121]  Yanli Wang,et al.  MMDB: Entrez's 3D-structure database , 2003, Nucleic Acids Res..

[122]  C DeLisi,et al.  Estimating the number of protein folds. , 1998, Journal of molecular biology.

[123]  G Frizelle,et al.  The management of complexity in manufacturing: a strategic route map to competitive advantage through the control and measurement of complexity , 1998 .

[124]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[125]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[126]  T L Blundell,et al.  CAMPASS: a database of structurally aligned protein superfamilies. , 1998, Structure.

[127]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[128]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[129]  Ann M Stock,et al.  Signal transduction in bacteria: molecular mechanisms of stimulus-response coupling. , 1998, Current opinion in microbiology.

[130]  J M Thornton,et al.  Domain assignment for protein structures using a consensus approach: Characterization and analysis , 1998, Protein science : a publication of the Protein Society.

[131]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[132]  L. Shimkets Structure and Sizes of Genomes of the Archaea and Bacteria , 1998 .

[133]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[134]  Sarah A. Teichmann,et al.  DIVCLUS: an automatic method in the GEANFAMMER package that finds homologous domains in single- and multi-domain proteins , 1998, Bioinform..

[135]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[136]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[137]  Temple F. Smith,et al.  The challenges of genome sequence annotation or “The devil is in the details” , 1997, Nature Biotechnology.

[138]  L. Hood,et al.  Gene families: the taxonomy of protein paralogs and chimeras. , 1997, Science.

[139]  G J Davies,et al.  A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. , 1997, The Biochemical journal.

[140]  K. Acharya,et al.  Molecular recognition of human angiogenin by placental ribonuclease inhibitor—an X‐ray crystallographic study at 2.0 Å resolution , 1997, The EMBO journal.

[141]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[142]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[143]  Xiaojun Guan,et al.  Domain Identification by Clustering Sequence Alignments , 1997, ISMB.

[144]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[145]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[146]  K. Diederichs,et al.  Structural Basis of Light Harvesting by Carotenoids: Peridinin-Chlorophyll-Protein from Amphidinium carterae , 1996, Science.

[147]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[148]  J. Craig Venter,et al.  A new strategy for genome sequencing , 1996, Nature.

[149]  R A Goldstein,et al.  Why are some proteins structures so common? , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[150]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[151]  Chris Sander,et al.  The FSSP database: fold classification based on structure-structure alignment of proteins , 1996, Nucleic Acids Res..

[152]  R. Hausinger,et al.  Copyright � 1995, American Society for Microbiology Molecular Biology of Microbial Ureases , 1995 .

[153]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[154]  A. Bird,et al.  Gene number, noise reduction and biological complexity. , 1995, Trends in genetics : TIG.

[155]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[156]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[157]  H. Hanafusa,et al.  A transmembrane protein-tyrosine phosphatase contains spectrin-like repeats in its extracellular domain. , 1994, The Journal of biological chemistry.

[158]  Tom L. Blundell,et al.  Structure-based identification and clustering of protein families and superfamilies , 1994, J. Comput. Aided Mol. Des..

[159]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[160]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[161]  M. Borodovsky,et al.  Recognition of genes in DNA sequence with ambiguities. , 1993, Bio Systems.

[162]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[163]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[164]  C. Chothia Proteins. One thousand families for the molecular biologist. , 1992, Nature.

[165]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[166]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[167]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[168]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[169]  E. D. Hyman A new method of sequencing DNA. , 1988, Analytical biochemistry.

[170]  J. M. Prober,et al.  A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. , 1987, Science.

[171]  Lloyd M. Smith,et al.  Fluorescence detection in automated DNA sequence analysis , 1986, Nature.

[172]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[173]  K. Mullis,et al.  Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. , 1986, Cold Spring Harbor symposia on quantitative biology.

[174]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[175]  D. Schwartz,et al.  Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis , 1984, Cell.

[176]  B. Shaanan,et al.  Structure of human oxyhaemoglobin at 2.1 A resolution. , 1983, Journal of molecular biology.

[177]  F. Sanger,et al.  Nucleotide sequence of bacteriophage lambda DNA. , 1982, Journal of molecular biology.

[178]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[179]  W. Steigemann,et al.  Structure of erythrocruorin in different ligand states refined at 1.4 A resolution. , 1979, Journal of molecular biology.

[180]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[181]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[182]  P Argos,et al.  Exploring structural homology of proteins. , 1976, Journal of molecular biology.

[183]  R. Contreras,et al.  Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene , 1976, Nature.

[184]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[185]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[186]  M. Nirenberg,et al.  RNA CODEWORDS AND PROTEIN SYNTHESIS, 3. ON THE NUCLEOTIDE SEQUENCE OF A CYSTEINE AND A LEUCINE RNA CODEWORD. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[187]  Marshall W. Nirenberg,et al.  The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides , 1961, Proceedings of the National Academy of Sciences.

[188]  G. Gamow,et al.  The problem of information transfer from the nucleic acids to proteins. , 1956, Advances in biological and medical physics.

[189]  H. R. Wilson,et al.  Molecular structure of deoxypentose nucleic acids. , 1953, Nature.

[190]  R. Franklin,et al.  Molecular Configuration in Sodium Thymonucleate , 1953, Nature.

[191]  O. Avery,et al.  STUDIES ON THE CHEMICAL NATURE OF THE SUBSTANCE INDUCING TRANSFORMATION OF PNEUMOCOCCAL TYPES , 1946, The Journal of experimental medicine.

[192]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .