Artificial Neural Networks for Molecular Sequence Analysis

Artificial neural networks provide a unique computing architecture whose potential has attracted interest from researchers across different disciplines. As a technique for computational analysis, neural network technology is very well suited for the analysis of molecular sequence data. It has been applied successfully to a variety of problems, ranging from gene identification, to protein structure prediction and sequence classification. This article provides an overview of major neural network paradigms, discusses design issues, and reviews current applications in DNA/RNA and protein sequence analysis.

[1]  Edgardo A. Ferrán,et al.  Clustering proteins into families using artificial neural networks [published erratum appears in Comput Appl Biosci 1992 Jun;8(3): 305] , 1992, Comput. Appl. Biosci..

[2]  J. Stephen Judd,et al.  Optimal stopping and effective machine complexity in learning , 1993, Proceedings of 1995 IEEE International Symposium on Information Theory.

[3]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[4]  Cathy H. Wu,et al.  Gene Classification Artificial Neural System , 1995, Int. J. Artif. Intell. Tools.

[5]  Christian Cachin,et al.  Pedagogical pattern selection strategies , 1994, Neural Networks.

[6]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[7]  J M Chandonia,et al.  Neural networks for secondary structure and structural class predictions , 1995, Protein science : a publication of the Protein Society.

[8]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[9]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[10]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[13]  Michael C. O'Neill,et al.  Escherichia coli promoters: neural networks develop distinct descriptions in learning to search for promoters of different spacing classes , 1992, Nucleic Acids Res..

[14]  N. Blom,et al.  Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks , 1996, Protein science : a publication of the Protein Society.

[15]  Inna Dubchak,et al.  Comparison of Two Variations of Neural Network Approaches to the Prediction of Protein Folding Pattern , 1993, ISMB.

[16]  S H Kim,et al.  Prediction of protein folding class from amino acid composition , 1993, Proteins.

[17]  Cathy H. Wu,et al.  Motif identification neural design for rapid and sensitive protein family search , 1996, Comput. Appl. Biosci..

[18]  T. Creighton,et al.  Protein Folding , 1992 .

[19]  Susan M. Drake A Novel Approach. , 1996 .

[20]  LiMin Fu,et al.  Neural networks in computer intelligence , 1994 .

[21]  G Schneider,et al.  Peptide design in machina: development of artificial mitochondrial protein precursor cleavage sites by simulated molecular evolution. , 1995, Biophysical journal.

[22]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[23]  I Mahadevan,et al.  Analysis of E.coli promoter structures using neural networks. , 1994, Nucleic acids research.

[24]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[25]  S Brunak,et al.  Analysis of eukaryotic promoter sequences reveals a systematically occurring CT-signal. , 1995, Nucleic acids research.

[26]  A. Hall Applied Optics. , 2022, Science.

[27]  Halbert White,et al.  Artificial Neural Networks: Approximation and Learning Theory , 1992 .

[28]  O. Lund,et al.  Prediction of O-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase. , 1995, The Biochemical journal.

[29]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[30]  Eric B. Bartlett,et al.  Dynamic node architecture learning: An information theoretic approach , 1994, Neural Networks.

[31]  S H Kim,et al.  PROBE: a computer program employing an integrated neural network approach to protein structure prediction. , 1993, BioTechniques.

[32]  Yann Le Cun,et al.  A Theoretical Framework for Back-Propagation , 1988 .

[33]  T. D. Schneider,et al.  Characterization of Translational Initiation Sites in E. Coui , 1982 .

[34]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[35]  Jude W. Shavlik,et al.  Protein Structure Prediction: Selecting Salient Features from Large Candidate Pools , 1993, ISMB.

[36]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[37]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[38]  Jian Sun,et al.  Analysis of tRNA Gene Sequences by Neural Network , 1995, J. Comput. Biol..

[39]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[41]  Abraham Kandel,et al.  Hybrid Architectures for Intelligent Systems , 1992 .

[42]  G. Heijne A new method for predicting signal sequence cleavage sites. , 1986 .

[43]  M Kanehisa,et al.  An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. , 1992, Nucleic acids research.

[44]  M. O'Neill,et al.  Training back-propagation neural networks to define and detect DNA-binding sites. , 1991, Nucleic acids research.

[45]  E. Uberbacher,et al.  Discovering and understanding genes in human DNA sequence using GRAIL. , 1996, Methods in enzymology.

[46]  Piero Fariselli,et al.  LGANN: a parallel system combining a local genetic algorithm and neural networks for the prediction of secondary structure of proteins , 1995, Comput. Appl. Biosci..

[47]  Judith E. Dayhoff,et al.  Neural Network Architectures: An Introduction , 1989 .

[48]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[49]  Andreas Zell,et al.  A parallel neural network simulator on the connection machine CM-5 , 1995, Comput. Appl. Biosci..

[50]  Patrizio Arrigo,et al.  Identification of a new motif on nucleic acid sequence data using Kohonen's self-organizing map , 1991, Comput. Appl. Biosci..

[51]  T R Unnasch,et al.  DNA sequence analysis using hierarchical ART-based Classification Networks. , 1994, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[52]  M J Sternberg,et al.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. , 1992, Biochemistry.

[53]  Ying Xu,et al.  Detection of RNA Polymerase II Promoters and Polyadenylation Sites in Human DNA Sequence , 1996, Comput. Chem..

[54]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[55]  K. Tajima,et al.  Prediction of protein secondary structures by a neural network , 1993, Comput. Appl. Biosci..

[56]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[57]  Warren S. Sarle,et al.  Neural Networks and Statistical Models , 1994 .

[58]  Sanjeev S. Tambe,et al.  Analysis of transcription control signals using artificial neural networks , 1995, Comput. Appl. Biosci..

[59]  Eytan Domany,et al.  Models of Neural Networks I , 1991 .

[60]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[61]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[62]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[63]  A. Lapedes,et al.  Application of neural networks and information theory to the identification of E. coli transcriptional promoters , 1991 .

[64]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[65]  Pierre Baldi,et al.  Gradient descent learning algorithm overview: a general dynamical systems perspective , 1995, IEEE Trans. Neural Networks.

[66]  A. Lapedes,et al.  Application of neural networks and other machine learning algorithms to DNA sequence analysis , 1988 .

[67]  Patrizio Arrigo,et al.  Potentially functional regions of nucleic acids recognized by a Kohonen's self-organizing map , 1993, Comput. Appl. Biosci..

[68]  R. Staden Finding protein coding regions in genomic sequences. , 1990, Methods in enzymology.

[69]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[70]  Sándor Suhai,et al.  Computational Methods in Genome Research , 1994, Springer US.

[71]  A V Lukashin,et al.  Neural network models for promoter recognition. , 1989, Journal of biomolecular structure & dynamics.

[72]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[73]  M. Sternberg Prediction of protein structure and the principles of protein conformation , 1990 .

[74]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[75]  R. Hecht-Nielsen Counterpropagation networks. , 1987, Applied optics.

[76]  Philippe Tarroux,et al.  Detection of compositional constraints in nucleic acid sequences using neural networks , 1995, Comput. Appl. Biosci..

[77]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[78]  Jude Shavlik,et al.  Using knowledge-based neural networks to refine existing biological theories , 1993 .

[79]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[80]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[81]  R. Hider,et al.  Protein secondary structure: Analysis and prediction , 1984 .

[82]  Wilfrid S. Kendall,et al.  Networks and Chaos - Statistical and Probabilistic Aspects , 1993 .

[83]  K. Nakata,et al.  Prediction of zinc finger DNA binding protein , 1995, Comput. Appl. Biosci..

[84]  Gerald Sommer,et al.  Pattern Recognition by Self-Organizing Neural Networks , 1994 .

[85]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[86]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[87]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[88]  S S Tambe,et al.  Application of artificial neural networks for prokaryotic transcription terminator prediction , 1994, FEBS letters.

[89]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[90]  M. Reczko,et al.  Applications of Artificial Neural Networks in Genome Research , 1994 .

[91]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[92]  S. Brunak,et al.  Protein secondary structure and homology by neural networks The α‐helices in rhodopsin , 1988 .

[93]  Michael N. Liebman,et al.  Neural network analysis of protein tertiary structure , 1990 .

[94]  István Csabai,et al.  Improving signal peptide prediction accuracy by simulated neural network , 1991, Comput. Appl. Biosci..

[95]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[96]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[97]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[98]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[99]  Steven M. Muskal,et al.  Predicting protein secondary structure content. A tandem neural network approach. , 1992, Journal of molecular biology.

[100]  Michael N. Liebman,et al.  Use of the backpropagation neural network algorithm for prediction of protein folding patterns , 1993 .

[101]  S. Brunak,et al.  Cleaning the GenBank Arabidopsis thaliana data set. , 1996, Nucleic acids research.

[102]  Piero Fariselli,et al.  HTP: a neural network-based method for predicting the topology of helical transmembrane domains in proteins , 1996, Comput. Appl. Biosci..

[103]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[104]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[105]  C. Wu,et al.  Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. , 1994, Nucleic acids research.

[106]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[107]  R. Lohmann,et al.  A neural network model for the prediction of membrane‐spanning amino acid sequences , 1994, Protein science : a publication of the Protein Society.

[108]  Cathy H. Wu,et al.  Protein classification artificial neural system , 1992, Protein science : a publication of the Protein Society.

[109]  Gisbert Schneider,et al.  Artificial neural networks and simulated molecular evolution are potential tools for sequence-oriented protein design , 1994, Comput. Appl. Biosci..

[110]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[111]  A Kolinski,et al.  Neural network system for the evaluation of side-chain packing in protein structures. , 1995, Protein engineering.

[112]  Anders Gorm Pedersen,et al.  Investigations of Escherichia coli Promoter Sequences with Artificial Neural Networks: New Signals Discovered Upstream of the Transcriptional Startpoint , 1995, ISMB.

[113]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[114]  Benny Lautrup,et al.  A novel approach to prediction of the 3‐dimensional structures of protein backbones by neural networks , 1990, NIPS.

[115]  S. Brunak,et al.  Neural network model of the genetic code is strongly correlated to the GES scale of amino acid transfer free energies. , 1994, Journal of molecular biology.

[116]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[117]  Scott R. Presnell,et al.  Artificial neural networks for pattern recognition in biochemical sequences. , 1993, Annual review of biophysics and biomolecular structure.

[118]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[119]  J. Leo van Hemmen,et al.  Accelerating backpropagation through dynamic self-adaptation , 1996, Neural Networks.

[120]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[121]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[122]  E A Ferrán,et al.  Self‐organized neural maps of human protein sequences , 1994, Protein science : a publication of the Protein Society.

[123]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[124]  J Maizel,et al.  Identification of ribosome binding sites in Escherichia coli using neural network models. , 1995, Nucleic acids research.

[125]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[126]  Cathy H. Wu Classification Neural Networks for Rapid Sequence Annotation and Automated Database Organization , 1993, Comput. Chem..

[127]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[128]  G. McLachlan,et al.  Pattern Classification: A Unified View of Statistical and Neural Approaches. , 1998 .

[129]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[130]  Jürgen Schürmann,et al.  Pattern classification , 2008 .

[131]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[132]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[133]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[134]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[135]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[136]  Edgardo A. Ferrán,et al.  A hybrid method to cluster protein sequences based on statistics and artificial neural networks , 1993, Comput. Appl. Biosci..

[137]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.