Artificial intelligence techniques for bioinformatics.

This review provides an overview of the ways in which techniques from artificial intelligence (AI) can be usefully employed in bioinformatics, both for modelling biological data and for making new discoveries. The paper covers three techniques: symbolic machine learning approaches (nearest neighbour and identification tree techniques), artificial neural networks and genetic algorithms. Each technique is introduced and supported with examples taken from the bioinformatics literature. These examples include folding prediction, viral protease cleavage prediction, classification, multiple sequence alignment and microarray gene expression analysis.

[1]  E. Keedwell,et al.  Modelling gene regulatory data using artificial neural networks , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[2]  Ajit Narayanan,et al.  Mining viral protease data to extract cleavage knowledge , 2002, ISMB.

[3]  Hans-Peter Kriegel,et al.  Nearest Neighbor Classification in 3D Protein Databases , 1999, ISMB.

[4]  Dr. Zbigniew Michalewicz,et al.  How to Solve It: Modern Heuristics , 2004 .

[5]  P Argos,et al.  Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. , 1996, Journal of molecular biology.

[6]  Cathy H. Wu Artificial Neural Networks for Molecular Sequence Analysis , 1997, Comput. Chem..

[7]  P. Argos,et al.  Potential of genetic algorithms in protein folding and protein engineering simulations. , 1992, Protein engineering.

[8]  Hitoshi Iba,et al.  Inference of gene regulatory model by genetic algorithms , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[9]  Esko Ukkonen,et al.  Predicting gene regulatory elements from their expression data in the complete yeast genome , 1998, German Conference on Bioinformatics.

[10]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[11]  Sung-Bae Cho,et al.  Gene expression classification using optimal feature/classifier ensemble with negative correlation , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[12]  John H. Holland,et al.  Genetic Algorithms and the Optimal Allocation of Trials , 1973, SIAM J. Comput..

[13]  Teuvo Kohonen,et al.  A Simple Paradigm for the Self-Organized Formation of Structured Feature Maps , 1982 .

[14]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[15]  Donald O. Walter,et al.  Self-Organizing Systems , 1987, Life Science Monographs.

[16]  Arno Siebes,et al.  Data Mining: the search for knowledge in databases. , 1994 .

[17]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[18]  Federico M. Stefanini,et al.  The reduction of large molecular profiles to informative components using a Genetic Algorithm , 2000, Bioinform..

[19]  George E. P. Box,et al.  Evolutionary Operation: a Method for Increasing Industrial Productivity , 1957 .

[20]  Jin Chu Wu,et al.  Predicting RNA H-type pseudoknots with the massively parallel genetic algorithm , 1997, Comput. Appl. Biosci..

[21]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[22]  William Nick Street,et al.  An Inductive Learning Approach to Prognostic Prediction , 1995, ICML.

[23]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  Jin Chu Wu,et al.  An annealing mutation operator in the genetic algorithms for RNA folding , 1996, Comput. Appl. Biosci..

[28]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[29]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[30]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[31]  W. B. Langdon,et al.  Genetic Programming and Data Structures , 1998, The Springer International Series in Engineering and Computer Science.

[32]  K C Chou,et al.  Artificial neural network model for predicting HIV protease cleavage sites in protein , 1998 .

[33]  R. Sokal,et al.  A QUANTITATIVE APPROACH TO A PROBLEM IN CLASSIFICATION† , 1957, Evolution; International Journal of Organic Evolution.

[34]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[35]  Jin Chu Wu,et al.  The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation , 2001, Bioinform..

[36]  W. N. Street,et al.  Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates. , 1994, Cancer letters.

[37]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[38]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Barry Robson,et al.  An algorithm for secondary structure determination in proteins based on sequence similarity , 1986, FEBS letters.

[40]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[41]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[42]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[43]  David Coley,et al.  Introduction to Genetic Algorithms for Scientists and Engineers , 1999 .

[44]  Erik D. Goodman,et al.  A Standard GA Approach to Native Protein Conformation Prediction , 1995 .

[45]  William E. Hart,et al.  Protein structure prediction with evolutionary algorithms , 1999 .

[46]  Michael D. Vose,et al.  The simple genetic algorithm - foundations and theory , 1999, Complex adaptive systems.

[47]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[48]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[49]  John R. Koza,et al.  Genetic Programming III: Automatic Pro-gramming and Automatic Circuit Synthesis , 2001 .

[50]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[51]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[52]  Satoru Miyano,et al.  Challenges for Intelligent Systems in Biology , 2001, IEEE Intell. Syst..

[53]  M. Su,et al.  Multi-domain gating network for classification of cancer cells using gene expression data , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[54]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[55]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[56]  Franz Rothlauf,et al.  Representations for genetic and evolutionary algorithms , 2002, Studies in Fuzziness and Soft Computing.

[57]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[58]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.