A historical perspective on gene/protein functional assignment

Sequence determination and analysis began on proteins in the 1950s, with RNA starting about a decade later and DNA a similar period later still. Hence many of the concepts for function prediction were first developed by looking at amino acid sequences. Over time these methods have become much more sophisticated, allowing better discrimination of only weak similarities. The most recent developments concern an examination of contextual information, such as operon structure, metabolic reconstruction or co-expression profiles.

[1]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[2]  L. Duret,et al.  Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. , 1993, Nucleic acids research.

[3]  A. Bird CpG-rich islands and the function of DNA methylation , 1986, Nature.

[4]  T. Hodgman,et al.  A new superfamily of replicative proteins , 1988, Nature.

[5]  I. Adzhubei,et al.  [Role of the code redundancy in determining cotranslational protein folding]. , 1989, Biokhimiia.

[6]  F. Sanger,et al.  A Rapid Method for Determining Sequences in DNA by Primed Synthesis with DNA Polymerase , 1989 .

[7]  S. Warren,et al.  Trinucleotide repeat expansion and human disease. , 1995, Annual review of genetics.

[8]  Philipp Bucher,et al.  The Eukaryotic Promoter Database EPD , 1998, Nucleic Acids Res..

[9]  H. Himeno,et al.  A bacterial RNA that functions as both a tRNA and an mRNA. , 1998, Trends in biochemical sciences.

[10]  T. Südhof,et al.  The LDL receptor gene: a mosaic of exons shared with different proteins. , 1985, Science.

[11]  T. Hodgman,et al.  A new superfamily of replicative proteins , 1988, Nature.

[12]  F Galibert,et al.  Determination of the nucleotide sequence of a fragment of bacteriophage phiX 174 DNA. , 1973, Nature: New biology.

[13]  I. Adzhubei,et al.  Nonuniform size distribution of nascent globin peptides, evidence for pause localization sites, and a cotranslational protein-folding model , 1991, Journal of protein chemistry.

[14]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.

[15]  Gary D. Stormo,et al.  Finding Common Sequence and Structure Motifs in a Set of RNA Sequences , 1997, ISMB.

[16]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[17]  F. Sanger,et al.  A two-dimensional fractionation procedure for radioactive nucleotides. , 1965, Journal of molecular biology.

[18]  Shoshana J. Wodak,et al.  Identification of predictive sequence motifs limited by protein structure data base size , 1988, Nature.

[19]  W. Gilbert,et al.  A new method for sequencing DNA. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[20]  R F Doolittle,et al.  Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. , 1983, Science.

[21]  S. Henikoff,et al.  Protein family classification based on searching a database of blocks. , 1994, Genomics.

[22]  Rodger Staden,et al.  Methods to define and locate patterns of motifs in sequences , 1988, Comput. Appl. Biosci..

[23]  G. Dreyfuss,et al.  Ribonucleoprotein particles in cellular processes , 1988, The Journal of cell biology.

[24]  P. Argos,et al.  Scrutineer: a computer program that flexibly seeks and describes motifs and profiles in protein sequence databases [published erratum appears in Comput Appl Biosci 1990 Oct;6(4): 431] , 1990, Comput. Appl. Biosci..

[25]  S. Oliver A network approach to the systematic analysis of yeast gene function. , 1996, Trends in genetics : TIG.

[26]  F. Sanger,et al.  The arrangement of amino acids in proteins. , 1952, Advances in protein chemistry.

[27]  J. P. Mornon,et al.  HCABAND: a computer program for the 2D-helical representation of protein sequences , 1990, Comput. Appl. Biosci..

[28]  Paul Stroobant,et al.  Platelet-derived growth factor is structurally related to the putative transforming protein p28sis of simian sarcoma virus , 1983, Nature.

[29]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[30]  F. Sanger,et al.  Use of polynucleotide kinase in fingerprinting non-radioactive nucleic acids. , 1969, Journal of molecular biology.

[31]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[32]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[33]  Peter D. Karp,et al.  EcoCyc: Encyclopedia of Escherichia coli genes and metabolism , 1998, Nucleic Acids Res..

[34]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[35]  F Galibert,et al.  Direct determination of DNA nucleotide sequences: structure of a fragment of bacteriophage phiX172 DNA. , 1974, Journal of molecular biology.

[36]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[37]  M. Schiffer,et al.  Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. , 1967, Biophysical journal.

[38]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[39]  David Haussler,et al.  Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families , 1993, ISMB.

[40]  R Staden,et al.  An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. , 1982, Nucleic acids research.

[41]  D. Davies,et al.  A CORRELATION BETWEEN AMINO ACID COMPOSITION AND PROTEIN STRUCTURE. , 1964, Journal of molecular biology.

[42]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[43]  J. Pipas,et al.  Method for predicting RNA secondary structure. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[44]  C Sander,et al.  Bioinformatics and the discovery of gene function. , 1996, Trends in genetics : TIG.

[45]  A. F. Neuwald,et al.  Detecting patterns in protein sequences. , 1994, Journal of molecular biology.

[46]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[47]  S A Krawetz,et al.  Mathematical model to predict regions of chromatin attachment to the nuclear matrix. , 1997, Nucleic acids research.

[48]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[49]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[50]  C. Hodgman,et al.  Reported sequence homology between Alzheimer amyloid770 and the MRC OX-2 antigen does not predict function , 1995, Brain Research Bulletin.

[51]  D. Ellar,et al.  Models for the structure and function of the Bacillus thuringiensis delta-endotoxins determined by compilational analysis. , 1990, DNA sequence : the journal of DNA sequencing and mapping.

[52]  M J Sternberg,et al.  Machine learning approach for the prediction of protein secondary structure. , 1990, Journal of molecular biology.

[53]  David B. Searls,et al.  Linguistic approaches to biological sequences , 1997, Comput. Appl. Biosci..

[54]  Charlie Hodgman,et al.  The elucidation of protein function from its amino acid sequence , 1986, Comput. Appl. Biosci..

[55]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[56]  R. Doolittle Of urfs and orfs : a primer on how to analyze devised amino acid sequences , 1986 .

[57]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[58]  Christopher J. Rawlings,et al.  DNA and protein sequence analysis : a practical approach , 1997 .

[59]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[60]  Dan S. Prestridge,et al.  SIGNAL SCAN 4.0: additional databases and sequence formats , 1996, Comput. Appl. Biosci..

[61]  Russell F. Doolittle,et al.  The genealogy of some recently evolved vertebrate proteins , 1985 .

[62]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[63]  Charlie Hodgman,et al.  The elucidation of protein function by sequence motif analysis , 1989, Comput. Appl. Biosci..

[64]  A J Davison,et al.  Alphaherpesviruses possess a gene homologous to the protein kinase gene family of eukaryotes and retroviruses. , 1986, Nucleic acids research.

[65]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[66]  A. D. McLachlan,et al.  Codon preference and its use in identifying protein coding regions in long DNA sequences , 1982, Nucleic Acids Res..

[67]  T. Attwood,et al.  PRINTS--a protein motif fingerprint database. , 1994, Protein engineering.

[68]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[69]  A. Gibbs,et al.  The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. , 1970, European journal of biochemistry.

[70]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[71]  W. C. Barker Of URFs and ORFs: A primer on how to analyze derived amino acid sequences: Russell F. Doolittle, University Science Books, Mill Valley, CA. Paperback. Under $15 , 1987 .

[72]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[73]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[74]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[76]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[77]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[78]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[79]  R. Doolittle,et al.  Of urfs and orfs , 1986 .

[80]  T. Heinemeyer,et al.  Databases on transcriptional regulation : TRANSFAC , TRRD and COMPEL , 1997 .

[81]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[82]  F. Sanger,et al.  The use of thin acrylamide gels for DNA sequencing , 1978, FEBS letters.

[83]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[84]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.