Data mining for building neural protein sequence classification systems with improved performance

Traditionally, two protein sequences are classified into the same class if their feature patterns have high homology. These feature patterns were originally extracted by sequence alignment algorithms, which measure similarity between an unseen protein sequence and identified protein sequences. Neural network approaches, while reasonably accurate at classification, give no information about the relationship between the unseen case and the classified items that is useful to biologist. In contrast, in this paper we use a generalized radial basis function (GRBF) neural network architecture that generates fuzzy classification rules that could be used for further knowledge discovery. Our proposed techniques were evaluated using protein sequences with ten classes of super-families downloaded from a public domain database, and the results compared favorably with other standard machine learning techniques.

[1]  Tharam S. Dillon,et al.  Protein Sequences Classification Using Modular RBF Neural Networks , 2002, Australian Joint Conference on Artificial Intelligence.

[2]  Etienne Barnard,et al.  A comparison between criterion functions for linear classifiers, with an application to neural nets , 1989, IEEE Trans. Syst. Man Cybern..

[3]  Sung Yang Bang,et al.  An Efficient Method to Construct a Radial Basis Function Neural Network Classifier , 1997, Neural Networks.

[4]  Magne Setnes,et al.  GA-fuzzy modeling and classification: complexity and performance , 2000, IEEE Trans. Fuzzy Syst..

[5]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[6]  O.K. Ersoy,et al.  Neural network learning of low-probability events , 1996, IEEE Transactions on Aerospace and Electronic Systems.

[7]  Sankar K. Pal,et al.  Knowledge-based fuzzy MLP for classification and rule generation , 1997, IEEE Trans. Neural Networks.

[8]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[9]  Joaquin Dopazo,et al.  Self‐organizing tree‐growing network for the classification of protein sequences , 1998, Protein science : a publication of the Protein Society.

[10]  Cathy H. Wu Artificial Neural Networks for Molecular Sequence Analysis , 1997, Comput. Chem..

[11]  R.P. Lippmann,et al.  Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[14]  Anders Krogh,et al.  SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM , 1995 .

[15]  Cathy H. Wu,et al.  Protein classification artificial neural system , 1992, Protein science : a publication of the Protein Society.

[16]  Tharam S. Dillon,et al.  Automated knowledge acquisition , 1994, Prentice Hall International series in computer science and engineering.

[17]  W. Pedrycz,et al.  Fuzzy computing for data mining , 1999, Proc. IEEE.

[18]  Hermann Ney,et al.  On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Dianhui Wang,et al.  Protein sequences classification using radial basis function (RBF) neural networks , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Mario Vento,et al.  A method for improving classification reliability of multilayer perceptrons , 1995, IEEE Trans. Neural Networks.

[22]  Mario Vento,et al.  To reject or not to reject: that is the question-an answer in case of neural classifiers , 2000, IEEE Trans. Syst. Man Cybern. Part C.