Optimizing Discriminant Model for Improved Classification of Protein

Classifiers based on discriminant model achieved the highest accuracy compared to other protein classification methods in remote homology detection, but all of the classifiers were troubled by imbalance training in modeling. This paper presented a protein classification based on optimization of discriminant model to further improve the classifier performance by setting different penalty coefficients for the positive and negative samples to balance the training set weights. Comparative experiments show that the method based on optimized discriminant model obtained higher accuracy, and the method can improve the performance of all classifiers based on discriminant model by optimization of the parameters.

[1]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[2]  Y. Freund,et al.  Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[7]  B. Rost,et al.  Accelerating the Original Profile Kernel , 2013, PloS one.

[8]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[9]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[10]  M. Gribskov,et al.  [9] Profile analysis , 1990 .

[11]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[12]  Yu-Dong Cai,et al.  Support vector machines for prediction of protein domain structural class. , 2003, Journal of theoretical biology.

[13]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[14]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15]  Jason Weston,et al.  Combining classifiers for improved classification of proteins from sequence or structure , 2008, BMC Bioinformatics.

[16]  Ewan Birney,et al.  Hidden Markov models in biological sequence analysis , 2001, IBM J. Res. Dev..

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.