Multi-criteria optimization classifier using fuzzification, kernel and penalty factors for predicting protein interaction hot spots

Classification simulation using SVM, fuzzy SVM, and FKP-MCOC classifiers. An improved FKP-MCO classifier based on fuzzification method, kernel technique, and penalty factors is proposed and is used for predicting protein-protein interaction hot spots.A fuzzy contribution of each input point is introduced to MCO classifier for soft separation.The penalty factors are used to trade-off overfitting for the majority and underfitting for the minority in dataset.FKP-MCOC avoids solving the quadratic programming problem so as to gain efficiency.FKP-MCOC obtains better performance of predicting active compounds in bioassay and protein interaction hot spots than MCOC and other classifiers in stability, separation and generalization. In order to understand the patterns of various biological processes and discover the principles of protein-protein interactions (PPI), it is important to develop effective methods for identifying and predicting PPI and their hot spots accurately. As for multi-criteria optimization classifier (MCOC), it can learn a decision function from different classes of training data and use it to predict the class labels of unknown samples. In many real world applications, owing to noises, outliers, imbalanced class distribution, nonlinearly separable problems, and other uncertainties, the predictive performance of MCOC degenerates rapidly. In this paper, we introduce a fuzzy contribution to each instance of training data, the unequal penalty factors to the samples in imbalanced classes, and kernel method to nonlinearly separable dataset, then a novel multi-criteria optimization classifier with fuzzification, kernel and penalty factors (FKP-MCOC) is constructed so as to reduce the effects of anomalies, improve the class imbalanced performance, and nonlinear separability in classification. The experimental results of predicting active compounds and protein interaction hot spots and comparison with MCOC, support vector machines (SVM) and fuzzy SVM, the conclusion shows that FKP-MCOC significantly increases the efficiency of classification, the partition of active and inactive compounds in bioassay, the separation of hot spot residues and energetically unimportant residues in protein interactions, and the generalization of predicting active compounds and hot spot residues in new instances.

[1]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Zhengxin Chen,et al.  A Multi-criteria Convex Quadratic Programming model for credit data analysis , 2008, Decis. Support Syst..

[4]  Shigeo Abe,et al.  Fuzzy LP-SVMs for Multiclass Problems , 2004, ESANN.

[5]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[6]  Wen Yu,et al.  On-Line Modeling Via Fuzzy Support Vector Machines , 2008, MICAI.

[7]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[8]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[9]  Jie Liang,et al.  Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. , 2004, Journal of molecular biology.

[10]  Shigeo Abe,et al.  Fuzzy least squares support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[11]  Tsau Young Lin,et al.  Granular Computing and Rough Sets - An Incremental Development , 2010, Data Mining and Knowledge Discovery Handbook.

[12]  Yong Shi,et al.  A rough set-based multiple criteria linear programming approach for the medical diagnosis and prognosis , 2009, Expert Syst. Appl..

[13]  Julie C. Mitchell,et al.  An automated decision‐tree approach to predicting protein interaction hot spots , 2007, Proteins.

[14]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[15]  S. Abe,et al.  Fuzzy support vector machines for pattern classification , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[16]  Jing He,et al.  MCLP-based methods for improving "Bad" catching rate in credit cardholder behavior analysis , 2008, Appl. Soft Comput..

[17]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[18]  Zhang Yi,et al.  Fuzzy SVM with a new fuzzy membership function , 2006, Neural Computing & Applications.

[19]  Longin Jan Latecki,et al.  Improving SVM Classification on Imbalanced Data Sets in Distance Spaces , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[20]  R. Nussinov,et al.  Conservation of polar residues as hot spots at protein interfaces , 2000, Proteins.

[21]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[22]  José Antonio Reyes,et al.  Machine learning for the prediction of protein-protein interactions , 2010 .

[23]  Fred Glover,et al.  IMPROVED LINEAR PROGRAMMING MODELS FOR DISCRIMINANT ANALYSIS , 1990 .

[24]  Di Wu,et al.  Prediction of protein interaction hot spots using rough set-based multiple criteria linear programming. , 2011, Journal of theoretical biology.

[25]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[26]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[27]  Doheon Lee,et al.  A feature-based approach to modeling protein–protein interaction hot spots , 2009, Nucleic acids research.

[28]  Manpreet Singh MACHINE LEARNING CLASSIFIERS FOR HUMAN PROTEIN FUNCTION PREDICTION , 2013 .

[29]  Yong Shi,et al.  Classifications Of Credit Cardholder Behavior By Using Fuzzy Linear Programming , 2004, Int. J. Inf. Technol. Decis. Mak..

[30]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[31]  Piero Fariselli,et al.  A neural network method to improve prediction of protein-protein interaction sites in heterocomplexes , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[32]  Yong Shi,et al.  An Effective Classification Approach Based on Fuzzy Set and Multiple Criteria Linear Programming , 2009 .

[33]  Jianjun Wang,et al.  Imbalanced SVM Learning with Margin Compensation , 2008, ISNN.

[34]  Zhan Zhang,et al.  Kernel-based multiple criteria linear programming classifier , 2010, ICCS.

[35]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Amanda C. Schierz Virtual screening of bioassay data , 2009, J. Cheminformatics.

[37]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[38]  Ji Gao,et al.  Improving SVM Classification with Imbalance Data Set , 2009, ICONIP.

[39]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[40]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[41]  Yong Shi,et al.  Multiple criteria optimization-based data mining methods and applications: a systematic survey , 2010, Knowledge and Information Systems.

[42]  Bao Qing Hu,et al.  Feature Selection using Fuzzy Support Vector Machines , 2006, Fuzzy Optim. Decis. Mak..

[43]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[44]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[45]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Lutz Hamel,et al.  Knowledge Discovery with Support Vector Machines , 2009 .

[47]  Yong Shi,et al.  Multiple criteria programming models for VIP E-Mail behavior analysis , 2010, Web Intell. Agent Syst..

[48]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[49]  F. Glover,et al.  Simple but powerful goal programming models for discriminant problems , 1981 .

[50]  Yong Shi,et al.  Data Mining in Credit Card Portfolio Management: A Multiple Criteria Decision Making Approach , 2001 .

[51]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[52]  Haian Fu,et al.  Protein-protein interactions : methods and applications , 2004 .

[53]  Yi Peng,et al.  Data Mining via Multiple Criteria Linear Programming: Applications in Credit Card Portfolio Management , 2002, Int. J. Inf. Technol. Decis. Mak..

[54]  Piyali Chatterjee,et al.  PPI_SVM: Prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables , 2011, Cellular & Molecular Biology Letters.