Massively Parallel Symbolic Induction of Protein Structure/Function Relationships

Reports the development and implementation of efficient algorithms for several symbolic machine learning induction operators on a massively parallel computer. The authors invoke these operators as hardware induction subroutines under the control of a higher-level front-end LISP program. For them, the key contribution of this work is its demonstration of the scalability of the algorithms involved. The time complexity of the induction algorithms is essentially independent of the total size of the instance data pool, with essentially linear space (hardware) complexity. Everything described has been implemented in Common LISP or PARIS. The PARIS portion runs on a CM-2 Connection Machine. The system (ARIEL) has been applied to the DNA polymerases and to the transcriptional activators by domain experts. >

[1]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[2]  J. F. Collins,et al.  Applications of parallel processing algorithms for DNA sequence analysis , 1984, Nucleic Acids Res..

[3]  Lawrence Hunter Knowledge Acquisition Planning: Results and Prospects , 1989, ML.

[4]  Patrick Henry Winston,et al.  Repairing learned knowledge using experience , 1991 .

[5]  R H Lathrop,et al.  Potential structural motifs for reverse transcriptases. , 1989, Molecular biology and evolution.

[6]  Jeffrey Scott Vitter,et al.  Learning in parallel , 1988, COLT '88.

[7]  Gary L. Drescher,et al.  A Mechanism for Early Piagetian Learning , 1987, AAAI.

[8]  T. Smith,et al.  Prediction of similar transforming regions in simian virus 40 large T, adenovirus E1A, and myc oncoproteins , 1988, Journal of virology.

[9]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[10]  Harold Lathrop Richard Efficient methods for massively parallel symbolic induction : algorithms and implementation , 1990 .

[11]  M. Minsky The Society of Mind , 1986 .

[12]  P. Wolynes,et al.  Toward Protein Tertiary Structure Recognition by Means of Associative Memory Hamiltonians , 1989, Science.

[13]  David B. Searls Representing Genetic Information with Formal Grammars , 1988, AAAI.

[14]  Jill P. Mesirov,et al.  Protein structure prediction by a data-level parallel algorithm , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[15]  Michael S. Waterman,et al.  General methods of sequence comparison , 1984 .

[16]  Michael R. Lowry,et al.  Learning Physical Descriptions From Functional Definitions, Examples, and Precedents , 1983, AAAI.

[17]  I. Kuntz,et al.  Tertiary Structure Prediction , 1989 .

[18]  R. M. Abarbanel,et al.  Turn prediction in proteins using a pattern-matching approach. , 1986, Biochemistry.

[19]  R H Lathrop,et al.  Prediction of a common structural domain in aminoacyl-tRNA synthetases through use of a new pattern-directed inference system. , 1987, Biochemistry.

[20]  G. Temple,et al.  Nucleotide sequence of human papillomavirus type 31: a cervical neoplasia-associated virus. , 1989, Virology.

[21]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[22]  Alan S. Perelson,et al.  The immune system, adaptation, and machine learning , 1986 .

[23]  G Kolata Trying to crack the second half of the genetic code. , 1986, Science.

[24]  Lawrence Hunter,et al.  Knowledge acquisition planning: gaining expertise through experience , 1989 .

[25]  Temple F. Smith,et al.  Cell-division sequence motif , 1988, Nature.

[26]  Jill P. Mesirov,et al.  Study of protein sequence comparison metrics on the Connection Machine CM-2 , 1988 .

[27]  Taylor L. Booth,et al.  Inference of Finite-State Probabilistic Grammars , 1977, IEEE Transactions on Computers.

[28]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[29]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[30]  R. F. Smith,et al.  Automatic generation of primary sequence patterns from sets of related protein sequences. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[32]  Richard H. Lathrop,et al.  ARIADNE: pattern-directed inference and hierarchical abstraction in protein structure recognition , 1987, CACM.

[33]  Q L Zhu,et al.  Acid helix‐turn activator motif , 1990, Proteins.

[34]  R H Lathrop,et al.  Consensus topography in the ATP binding site of the simian virus 40 and polyomavirus large tumor antigens. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[35]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[36]  G. L. Steele Common Lisp , 1990 .

[37]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[38]  P. D. Karp,et al.  Coordinating the use of qualitative and quantitative knowledge in declarative device modeling , 1989 .

[39]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[40]  Peter Friedland,et al.  Discovering the Secrets of DNA , 1985, Computer.

[41]  Patric Savage,et al.  Proposed Guidelines for an Automated Cartridge Repository , 1985, Computer.

[42]  Russ B. Altman,et al.  PROTEAN: Deriving Protein Structure from Constraints , 1986, AAAI.

[43]  Allen Newell,et al.  Soar/PSM-E: investigating match parallelism in a learning production sytsem , 1988, PPoPP 1988.

[44]  R H Lathrop,et al.  Pattern descriptors and the unidentified reading frame 6 human mtDNA dinucleotide‐binding site , 1988, Proteins.

[45]  W. Daniel Hillis,et al.  The connection machine , 1985 .