Artificial Intelligence and Molecular Biology

Molecular biology is emerging as an important domain for artificial intelligence research. The advantages of biology for design and testing of AI systems include large amounts of available online data, significant (but incomplete) background knowledge, a wide variety of problems commensurate with AI technologies, clear standards of success, cooperative domain experts, non-military basic research support and percieved potential for practical (and profitable) applications. These considerations have motivated a growing group of researchers to pursue both basic and applied AI work in the domain. More than seventy-five researchers working on these problems gathered at Stanford for a AAAI sponsored symposium on the topic. This article provides a description of much of the work presented at the meeting, and fills in the basic biology background necessary to place it in context.

[1]  Kenneth D. Forbus Qualitative Process Theory , 1984, Artif. Intell..

[2]  David Eisenberg,et al.  The helical hydrophobic moment: a measure of the amphiphilicity of a helix , 1982, Nature.

[3]  J. M. Thornton,et al.  Prediction of super-secondary structure in proteins , 1983, Nature.

[4]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[5]  Raymond E. Carhart,et al.  Computer assistance for the structural chemist , 1977 .

[6]  T. Smith,et al.  Prediction of similar transforming regions in simian virus 40 large T, adenovirus E1A, and myc oncoproteins , 1988, Journal of virology.

[7]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[8]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[9]  Harold Lathrop Richard Efficient methods for massively parallel symbolic induction : algorithms and implementation , 1990 .

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  P. Y. Chou,et al.  Empirical predictions of protein conformation. , 1978, Annual review of biochemistry.

[12]  M S Waterman,et al.  Consensus methods for DNA and protein sequence alignment. , 1990, Methods in enzymology.

[13]  Ruth Nussinov,et al.  A fixed-point alignment technique for detection of recurrent and common sequence motifs associated with biological features , 1988, Comput. Appl. Biosci..

[14]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[15]  T. Smith,et al.  Alignment of protein sequences using secondary structure: a modified dynamic programming method. , 1990, Protein engineering.

[16]  R. Evans,et al.  Multiple and cooperative trans-activation domains of the human glucocorticoid receptor , 1988, Cell.

[17]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Barry Robson,et al.  An algorithm for secondary structure determination in proteins based on sequence similarity , 1986, FEBS letters.

[19]  K. Münger,et al.  Complex formation of human papillomavirus E7 proteins with the retinoblastoma tumor suppressor gene product. , 1989, The EMBO journal.

[20]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[21]  Makoto Nagao,et al.  A Structural Analysis of Complex Aerial Photographs , 1980, Advanced Applications in Pattern Recognition.

[22]  Peter A. Kollman,et al.  AMBER: Assisted model building with energy refinement. A general program for modeling molecules and their interactions , 1981 .

[23]  Christopher J. Rawlings,et al.  Reasoning about protein topology using the logic programming language PROLOG , 1985 .

[24]  Paul R. Cohen,et al.  A Survey of the Eighth National Conference on Artificial Intelligence: Pulling Together or Pulling Apart? , 1991, AI Mag..

[25]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[26]  S. Friend,et al.  Large T antigens of many polyomaviruses are able to form complexes with the retinoblastoma protein , 1990, Journal of virology.

[27]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[28]  Larry A. Rendell,et al.  Empirical learning as a function of concept character , 2004, Machine Learning.

[29]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[30]  G. Temple,et al.  Nucleotide sequence of human papillomavirus type 31: a cervical neoplasia-associated virus. , 1989, Virology.

[31]  Jude W. Shavlik,et al.  Training Knowledge-Based Neural Networks to Recognize Genes , 1990, NIPS.

[32]  M J Sternberg,et al.  Machine learning approach for the prediction of protein secondary structure. , 1990, Journal of molecular biology.

[33]  D. Horne,et al.  Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities , 1988, Biopolymers.

[34]  R. F. Smith,et al.  Automatic generation of primary sequence patterns from sets of related protein sequences. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Richard H. Lathrop,et al.  ARIEL: a massively parallel symbolic learning assistant for protein structure and function , 1991 .

[36]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[37]  R H Lathrop,et al.  Consensus topography in the ATP binding site of the simian virus 40 and polyomavirus large tumor antigens. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[38]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[39]  D. R. Boswell A program for template matching of protein sequences , 1988, Comput. Appl. Biosci..

[40]  D. Covell,et al.  Conformations of folded proteins in restricted spaces. , 1990, Biochemistry.

[41]  Richard H. Lathrop,et al.  ARIADNE: pattern-directed inference and hierarchical abstraction in protein structure recognition , 1987, CACM.

[42]  M. Kimura,et al.  Recent development of the neutral theory viewed from the Wrightian tradition of theoretical population genetics. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[43]  K. Nishikawa,et al.  Classification of proteins into groups based on amino acid composition and other characters. II. Grouping into four types. , 1983, Journal of biochemistry.

[44]  Barry C. Finzel,et al.  Molecular Modeling with Substructure Libraries Derived from Known Protein Structures , 1990 .

[45]  K. Y. Cockwell,et al.  Software tools for motif and pattern scanning: program descriptions including a universal sequence reading algorithm , 1989, Comput. Appl. Biosci..

[46]  Hamilton O. Smith,et al.  Finding sequence motifs in groups of functionally related proteins. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[47]  E. S. Domalski,et al.  Estimation of the Thermodynamic Properties of Hydrocarbons at 298.15 K , 1988 .

[48]  S Brunak,et al.  Protein secondary structure and homology by neural networks. The alpha-helices in rhodopsin. , 1988, FEBS letters.

[49]  Michael L. Mavrovouniotis,et al.  Computer-aided design of biochemical pathways , 1988 .

[50]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[51]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Peter Friedland,et al.  Discovering the Secrets of DNA , 1985, Computer.

[53]  L. A. Gribov,et al.  A dialogue computer program system for structure recognition of complex molecules by spectroscopic methods , 1983 .

[54]  Irwin D. Kuntz,et al.  A program for semi-automatic sequential resonance assignments in protein 1H nuclear magnetic resonance spectra , 1988 .

[55]  Richard H. Lathrop,et al.  Massively Parallel Symbolic Induction of Protein Structure/Function Relationships , 1993, Machine Learning: From Theory to Applications.

[56]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[57]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[58]  Stefano Pascarella,et al.  PRONET: a microcomputer program for predicting the secondary structure of proteins with a neural network , 1989, Comput. Appl. Biosci..

[59]  Steven M. Muskal,et al.  Prediction of the disulfide-bonding state of cysteine in proteins. , 1990, Protein engineering.

[60]  P K Warme,et al.  Computation of structures of homologous proteins. Alpha-lactalbumin from lysozyme. , 1974, Biochemistry.

[61]  G Kolata Trying to crack the second half of the genetic code. , 1986, Science.

[62]  J. Greer Comparative model-building of the mammalian serine proteases. , 1981, Journal of molecular biology.

[63]  Charlie Hodgman,et al.  The elucidation of protein function from its amino acid sequence , 1986, Comput. Appl. Biosci..

[64]  A. Johansson,et al.  Automatic evaluation of protein sequence functional patterns , 1991, Comput. Appl. Biosci..

[65]  Lawrence Hunter,et al.  Applying Bayesian classification to protein structure , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[66]  Terrence J. Sejnowski,et al.  A Parallel Network that Learns to Play Backgammon , 1989, Artif. Intell..

[67]  Q L Zhu,et al.  Acid helix‐turn activator motif , 1990, Proteins.

[68]  T. Hunter,et al.  The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. , 1988, Science.

[69]  Daniel Sabey Weld Switching Between Discrete and Continuous Process Models to Predict Molecular Genetic Activity , 1984 .

[70]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[71]  S. Brunak,et al.  Analysis of the secondary structure of the human immunodeficiency virus (HIV) proteins p17, gp120, and gp41 by computer modeling based on neural network methods. , 1990, Journal of acquired immune deficiency syndromes.

[72]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[73]  David M. Livingston,et al.  The product of the retinoblastoma susceptibility gene has properties of a cell cycle regulatory element , 1989, Cell.

[74]  Lawrence Hunter,et al.  Efficient Classification of Massive, Unsegmented Datastreams , 1992, ML.

[75]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[76]  F. Weinert Introduction and overview: Metacognition and motivation as determinants of effective learning and understanding , 1987 .

[77]  M. Karplus,et al.  Molecular dynamics simulations in biology , 1990, Nature.

[78]  Diana E. Forsythe,et al.  Knowledge acquisition for expert systems: some pitfalls and suggestions , 1989, IEEE Trans. Syst. Man Cybern..

[79]  Michael S. Waterman,et al.  General methods of sequence comparison , 1984 .

[80]  P. Argos,et al.  Scrutineer: a computer program that flexibly seeks and describes motifs and profiles in protein sequence databases [published erratum appears in Comput Appl Biosci 1990 Oct;6(4): 431] , 1990, Comput. Appl. Biosci..

[81]  Shoshana J. Wodak,et al.  Identification of predictive sequence motifs limited by protein structure data base size , 1988, Nature.

[82]  Wen-Hwa Lee,et al.  SV40 large tumor antigen forms a specific complex with the product of the retinoblastoma susceptibility gene , 1988, Cell.

[83]  George D. Rose,et al.  Prediction of chain turns in globular proteins on a hydrophobic basis , 1978, Nature.

[84]  T. P. Flores,et al.  Prediction of beta-turns in proteins using neural networks. , 1989, Protein engineering.

[85]  G Lapalme,et al.  The combination of symbolic and numerical computation for three-dimensional modeling of RNA. , 1991, Science.

[86]  Kenneth D. Forbus The qualitative process engine , 1989 .

[87]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[88]  R H Lathrop,et al.  Prediction of a common structural domain in aminoacyl-tRNA synthetases through use of a new pattern-directed inference system. , 1987, Biochemistry.

[89]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[90]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[91]  Teresa A. Webster,et al.  A modified Chou and Fasman protein structure algorithm , 1987, Comput. Appl. Biosci..

[92]  S Karlin,et al.  A method to identify distinctive charge configurations in protein sequences, with application to human herpesvirus polypeptides. , 1989, Journal of molecular biology.

[93]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[94]  J. Thornton,et al.  Protein motifs and data-base searching. , 1989, Trends in biochemical sciences.

[95]  R H Lathrop,et al.  Pattern descriptors and the unidentified reading frame 6 human mtDNA dinucleotide‐binding site , 1988, Proteins.

[96]  Roger C. Schank,et al.  Where's the AI? , 1991, AI Mag..

[97]  Jill P. Mesirov,et al.  Study of protein sequence comparison metrics on the connection machine CM-2 , 1989, Proceedings Supercomputing Vol.II: Science and Applications.

[98]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[99]  H. Ruley,et al.  Two regions of the adenovirus early region 1A proteins are required for transformation , 1988, Journal of virology.

[100]  Temple F. Smith,et al.  Cell-division sequence motif , 1988, Nature.

[101]  F. Cohen,et al.  Pattern-based approaches to protein structure prediction. , 1991, Methods in enzymology.

[102]  Michael G. Rossmann,et al.  Chemical and biological evolution of a nucleotide-binding protein , 1974, Nature.

[103]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[104]  J. Hein Unified approach to alignment and phylogenies. , 1990, Methods in enzymology.

[105]  P Bork,et al.  Recognition of different nucleotide-binding sites in primary structures using a property-pattern approach. , 1990, European journal of biochemistry.

[106]  R F Doolittle,et al.  Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. , 1983, Science.

[107]  K. Joback,et al.  ESTIMATION OF PURE-COMPONENT PROPERTIES FROM GROUP-CONTRIBUTIONS , 1987 .

[108]  H. Hinz,et al.  Thermodynamic Data for Biochemistry and Biotechnology , 1986 .

[109]  F. Šorm,et al.  Covalent structure of bovine trypsinogen. The position of the remaining amides. , 1966, Biochemical and biophysical research communications.

[110]  Jun Ma,et al.  Deletion analysis of GAL4 defines two transcriptional activating segments , 1987, Cell.

[111]  R. F. Smith,et al.  Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. , 1992, Protein engineering.

[112]  K Nishikawa,et al.  Correlation of the amino acid composition of a protein to its structural and biological characters. , 1982, Journal of biochemistry.

[113]  B. L. Sibanda,et al.  Three-dimensional structure, specificity and catalytic mechanism of renin , 1983, Nature.

[114]  V. Lim Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. , 1974, Journal of molecular biology.

[115]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[116]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[117]  R B Altman,et al.  Heuristic refinement method for determination of solution structure of proteins from nuclear magnetic resonance data. , 1989, Methods in enzymology.

[118]  S F Altschul,et al.  Statistical methods and insights for protein and DNA sequences. , 1991, Annual review of biophysics and biophysical chemistry.

[119]  G J Barton,et al.  Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.

[120]  Catherine Macken,et al.  Some statistical problems in the assessment of inhomogeneities of DNA sequence data , 1991 .

[121]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[122]  Michael R. Green,et al.  Transcription activation by the adenovirus E1a protein , 1989, Nature.

[123]  M. Sternberg,et al.  Flexible protein sequence patterns. A sensitive method to detect weak structural similarities. , 1990, Journal of molecular biology.

[124]  L. Patthy,et al.  Detecting distant homologies of mosaic proteins. Analysis of the sequences of thrombomodulin, thrombospondin complement components C9, C8 alpha and C8 beta, vitronectin and plasma cell membrane glycoprotein PC-1. , 1988, Journal of molecular biology.

[125]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[126]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[127]  Lawrence Hunter Knowledge Acquisition Planning: Results and Prospects , 1989, ML.

[128]  R H Lathrop,et al.  Potential structural motifs for reverse transcriptases. , 1989, Molecular biology and evolution.

[129]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[130]  E. Myers,et al.  Approximate matching of regular expressions. , 1989, Bulletin of mathematical biology.

[131]  Joshua Lederberg,et al.  How DENDRAL was conceived and born , 1987, HMI '87.

[132]  David B. Searls Representing Genetic Information with Formal Grammars , 1988, AAAI.

[133]  Victor R. Lesser,et al.  A Multi-Level Organization For Problem Solving Using Many, Diverse, Cooperating Sources Of Knowledge , 1975, IJCAI.

[134]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.

[135]  Rodger Staden,et al.  Methods to define and locate patterns of motifs in sequences , 1988, Comput. Appl. Biosci..

[136]  Russ B. Altman,et al.  PROTEAN: Deriving Protein Structure from Constraints , 1986, AAAI.

[137]  E. Moran,et al.  A region of SV40 large T antigen can substitute for a transforming domain of the adenovirus E1A products , 1988, Nature.

[138]  G J Barton,et al.  A knowledge-based architecture for protein sequence analysis and structure prediction. , 1990, Journal of molecular graphics.

[139]  U. Singh,et al.  A NEW FORCE FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC ACIDS AND PROTEINS , 1984 .

[140]  N. Itoh,et al.  Molecular cloning and sequence analysis of cDNA for batroxobin, a thrombin-like snake venom enzyme. , 1987, The Journal of biological chemistry.

[141]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[142]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[143]  Benny Lautrup,et al.  A novel approach to prediction of the 3‐dimensional structures of protein backbones by neural networks , 1990, NIPS.

[144]  K Matsubara,et al.  Cloning, characterization and nucleotide sequences of two cDNAs encoding human pancreatic trypsinogens. , 1986, Gene.

[145]  Lawrence Hunter,et al.  Knowledge acquisition planning for inference from large databases , 1990, Twenty-Third Annual Hawaii International Conference on System Sciences.

[146]  K. Scheidtmann,et al.  In vitro phosphorylation of SV40 large T antigen. , 1988, Virology.

[147]  Douglas L. Brutlag,et al.  Rapid searches for complex patterns in biological molecules , 1984, Nucleic Acids Res..

[148]  R. F. Smith,et al.  Identification of new protein kinase-related genes in three herpesviruses, herpes simplex virus, varicella-zoster virus, and Epstein-Barr virus , 1989, Journal of virology.

[149]  Eric Saund Abstraction and Representation of Continuous Variables in Connectionist Networks , 1986, AAAI.

[150]  J. Garnier,et al.  Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. , 1988, Biochimica et biophysica acta.

[151]  R J Fletterick,et al.  Secondary structure assignment for alpha/beta proteins by a combinatorial approach. , 1983, Biochemistry.

[152]  S H Kim,et al.  Predicting surface exposure of amino acids from protein sequence. , 1990, Protein engineering.

[153]  T F Smith,et al.  Structural characterization of a 14-residue peptide ligand of the retinoblastoma protein: comparison with a nonbinding analog. , 1991, Peptide research.