Quantitative prediction of peptide binding affinity by using hybrid fuzzy support vector regression

Graphical abstractDisplay Omitted HighlightsHigh-dimensional biological data sets are modelled with a regression based fuzzy system.An SVR based fuzzy model is proposed to find degree of peptide binding to MHC molecules.SVR is enhanced by adding the fuzziness concept.TSK fuzzy system is benefited from SVR-based training.The proposed models suggest that the predictive ability and performance are increased. Support vector machines have a wide use for the prediction problems in life sciences. It has been shown to offer more generalisation ability in input-output mapping. However, the performance of predictive models is often negatively influenced due to the complex, high-dimensional, and non-linear nature of the post-genome data. Soft computing methods can be used to model such non-linear systems. Fuzzy systems are one of the widely used methods of soft computing that model uncertainties. It is formed of interpretable rules aiding one to gain insight into applied model. This study is therefore concerned to provide more interpretable and efficient biological model with the development of a hybrid method that integrates the fuzzy system and support vector regression. In order to demonstrate the robustness of this new hybrid method, it is applied to the prediction of peptide binding affinity being one of the most challenging problems in the post-genomic era due to diversity in peptide families and complexity and high-dimensionality in the characteristic features of the peptides. Having used four different case studies, this hybrid predictive model has yielded the highest predictive power in all the four cases and achieved an improvement of as much as 34% compared to the results presented in the literature. Availability: Matlab scripts are available at https://github.com/sekerbigdatalab/tsksvr.

[1]  Antanas Verikas,et al.  Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data , 2015 .

[2]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[3]  Huseyin Seker,et al.  Support vector-based Takagi-Sugeno fuzzy system for the prediction of binding affinity of peptides , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[4]  Huseyin Seker,et al.  The quantitative prediction of HLA-B*2705 peptide binding affinities using Support Vector Regression to gain insights into its role for the Spondyloarthropathies , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[5]  Chia-Feng Juang,et al.  A Fuzzy System Constructed by Rule Generation and Iterative Linear SVR for Antecedent and Consequent Parameter Optimization , 2012, IEEE Transactions on Fuzzy Systems.

[6]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Tatsuya Akutsu,et al.  Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features , 2011, BMC Bioinformatics.

[9]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[10]  J. Mendel,et al.  Parametric design of stable type-2 TSK fuzzy systems , 2008, NAFIPS 2008 - 2008 Annual Meeting of the North American Fuzzy Information Processing Society.

[11]  Gangquan Si,et al.  A Clustering-analysis-based membership functions formation method for fuzzy controller of ball mill pulverizing system , 2013 .

[12]  J. Arthur,et al.  Predicting peptide binding to Major Histocompatibility Complex molecules. , 2011, Autoimmunity reviews.

[13]  Sheng Chen,et al.  Regularized orthogonal least squares algorithm for constructing radial basis function networks , 1996 .

[14]  Søren Buus,et al.  Longer peptide can be accommodated in the MHC class I binding site by a protrusion mechanism , 2000, European journal of immunology.

[15]  Huseyin Seker,et al.  CISAPS: Complex Informational Spectrum for the Analysis of Protein Sequences , 2015, Adv. Bioinformatics.

[16]  Tomer Hertz,et al.  Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions , 2005, RECOMB.

[17]  Georgios D. Mitsis,et al.  Detection and Removal of Muscle Artifacts from Scalp EEG Recordings in Patients with Epilepsy , 2014, 2014 IEEE International Conference on Bioinformatics and Bioengineering.

[18]  Manoj Bhasin,et al.  Analysis and prediction of affinity of TAP binding peptides using cascade SVM , 2004, Protein science : a publication of the Protein Society.

[19]  Jacek M. Leski,et al.  TSK-fuzzy modeling based on /spl epsiv/-insensitive learning , 2005, IEEE Transactions on Fuzzy Systems.

[20]  J. Schlom,et al.  Development of a murine mutant Ras CD8+ CTL peptide epitope variant that possesses enhanced MHC class I binding and immunogenic properties. , 1998, Journal of immunology.

[21]  Francisco Herrera,et al.  Ten years of genetic fuzzy systems: current framework and new trends , 2004, Fuzzy Sets Syst..

[22]  D. Kranz,et al.  T‐cell receptor binding affinities and kinetics: impact on T‐cell activity and specificity , 2009, Immunology.

[23]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[24]  Arne Elofsson,et al.  Prediction of MHC class I binding peptides, using SVMHC , 2002, BMC Bioinformatics.

[25]  Thomas A. Runkler,et al.  Alternating cluster estimation: a new tool for clustering and function approximation , 1999, IEEE Trans. Fuzzy Syst..

[26]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[27]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[28]  Chia-Feng Juang,et al.  Fuzzy Clustering-Based Neural Fuzzy Network with Support Vector Regression , 2010, 2010 5th IEEE Conference on Industrial Electronics and Applications.

[29]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[30]  Kristin P. Bennett,et al.  Prediction of peptide bonding affinity: kernel methods for nonlinear modeling , 2011, ArXiv.

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[32]  Naoki Abe,et al.  Empirical Evaluation of a Dynamic Experiment Design Method for Prediction of MHC Class I-Binding Peptides1 , 2002, The Journal of Immunology.

[33]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[34]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[35]  Chuen-Tsai Sun,et al.  Neuro-fuzzy modeling and control , 1995, Proc. IEEE.

[36]  Thomas Lengauer,et al.  Predicting MHC class I epitopes in large datasets , 2010, BMC Bioinformatics.

[37]  H. Robinson Principles and Procedures of Statistics , 1961 .

[38]  Jerry M. Mendel,et al.  On the Stability of Interval Type-2 TSK Fuzzy Logic Control Systems , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  H. Ishibuchi Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases , 2004 .

[40]  Jerome L. Myers,et al.  Research Design & Statistical Analysis , 1995 .

[41]  Chia-Feng Juang,et al.  Fuzzy System Learned Through Fuzzy Clustering and Support Vector Machine for Human Skin Color Segmentation , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[42]  H. Grey,et al.  Prediction of major histocompatibility complex binding regions of protein antigens by sequence pattern analysis. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[43]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[44]  M. Sugeno,et al.  Structure identification of fuzzy model , 1988 .

[45]  Shameek Ghosh,et al.  Hybrid biogeography based simultaneous feature selection and MHC class I peptide binding prediction using support vector machines and random forests. , 2013, Journal of immunological methods.

[46]  Robert D Bremel,et al.  An integrated approach to epitope analysis I: Dimensional reduction, visualization and prediction of MHC binding using amino acid principal components and regression approaches , 2010, Immunome research.

[47]  D. Flower,et al.  Additive method for the prediction of protein-peptide binding affinity. Application to the MHC class I molecule HLA-A*0201. , 2002, Journal of proteome research.