Protein classification artificial neural system

A neural network classification method is developed as an alternative approach to the large database search/organization problem. The system, termed Protein Classification Artificial Neural System (ProCANS), has been implemented on a Cray supercomputer for rapid superfamily classification of unknown proteins based on the information content of the neural interconnections. The system employs an n‐gram hashing function that is similar to the k‐tuple method for sequence encoding. A collection of modular back‐propagation networks is used to store the large amount of sequence patterns. The system has been trained and tested with the first 2,148 of the 8,309 entries of the annotated Protein Identification Resource protein sequence database (release 29). The entries included the electron transfer proteins and the six enzyme groups (oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases), with a total of 620 superfamilies. After a total training time of seven Cray central processing unit (CPU) hours, the system has reached a predictive accuracy of 90%. The classification is fast (i.e., 0.1 Cray CPU second per sequence), as it only involves a forward‐feeding through the networks. The classification time on a full‐scale system embedded with all known superfamilies is estimated to be within 1 CPU second. Although the training time will grow linearly with the number of entries, the classification time is expected to remain low even if there is a 10–100‐fold increase of sequence entries. The neural database, which consists of a set of weight matrices of the networks, together with the ProCANS software, can be ported to other computers and made available to the genome community. The rapid and accurate superfamily classification would be valuable to the organization of protein sequence databases and to the gene recognition in large sequencing projects.

[1]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[2]  Kathryn E. Sidman,et al.  The protein identification resource (PIR). , 1986, Nucleic acids research.

[3]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[4]  R F Doolittle,et al.  Searching through sequence databases. , 1990, Methods in enzymology.

[5]  J. Dayho Neural Network Architectures: an Introduction , 1990 .

[6]  Benny Lautrup,et al.  A novel approach to prediction of the 3‐dimensional structures of protein backbones by neural networks , 1990, NIPS.

[7]  George M. Whitson,et al.  PROCANS: a protein classification system using a neural network , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[8]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[9]  Cathy H. Wu,et al.  Protein classification using a neural network database system , 1991, ANNA '91.

[10]  Kazuo Asakawa,et al.  Stock market prediction system with modular neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[11]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Doolittle Redundancies in Protein Sequences , 1989 .

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  Hideki Asoh,et al.  An approximation of nonlinear discriminant analysis by multilayer neural networks , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[15]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[16]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[17]  George M. Whitson,et al.  Neural networks for molecular sequence database management , 1991, CSC '91.

[18]  M. O'Neill,et al.  Training back-propagation neural networks to define and detect DNA-binding sites. , 1991, Nucleic acids research.

[19]  Carl O. Pabo,et al.  New generation databases for molecular biology , 1987, Nature.

[20]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[21]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[22]  I Sauvaget,et al.  K-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping. , 1990, Methods in enzymology.

[23]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[24]  Patrick Gallinari,et al.  Multilayer perceptrons and data analysis , 1988, IEEE 1988 International Conference on Neural Networks.

[25]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[26]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[27]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[28]  V. Cherkassky,et al.  Performance of back propagation networks for associative database retrieval , 1989, International 1989 Joint Conference on Neural Networks.

[29]  Cathy H. Wu,et al.  A backpropagation system for hypercubes , 1990, Proceedings of the 1990 Symposium on Applied Computing.

[30]  Winona C. Barker,et al.  Protein sequence database. , 1990 .

[31]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[32]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[33]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[34]  Philip D. Wasserman,et al.  Neural computing - theory and practice , 1989 .