Hyperparameter Estimation in SVM with GPU Acceleration for Prediction of Protein-Protein Interactions

For classification tasks, such as protein-protein interactions (PPI), support vector machines (SVMs) have been continually utilised as a standard machine learning model. However, most practices in PPIs classifications are limited to common circumstances with small datasets and low feature dimensions, due to the big computation burden of kernel functions and quadratic optimization of SVM. Alternatively, these practical experiences might tend to employ a linear model once the dataset becomes larger, which may have exclusively lost the kernel function’s potential. Since there are different defined kernels and various groups of hyperparameter, the time costs in estimating a best set of hyperparameter by traditional grid search are subsequently tremendous for PPI classification. To address this challenge, in this paper, we present a more efficient solution of hyperparameter estimation by gaining acceleration with GPU, which trains SVM efficiently and accurately with kernel functions calculation accelerated on various PPI datasets. The experiments are firstly conducted on PPI classification task, and we have exclusively evaluated the effectiveness on five public classification datasets. Our solution demonstrates a faster and more accurate performance comparing with the state-of-the-art.

[1]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[2]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.

[3]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[4]  Carlos Prieto,et al.  APID: Agile Protein Interaction DataAnalyzer , 2006, Nucleic Acids Res..

[5]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[6]  Ioannis Kompatsiaris,et al.  GPU acceleration for support vector machines , 2011, WIAMIS 2011.

[7]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[8]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Kyungsook Han,et al.  Prediction of protein-protein interactions between viruses and human by an SVM model , 2012, BMC Bioinformatics.

[11]  Fatih Erdogan Sevilgen,et al.  PHISTO: pathogen-host interaction search tool , 2013, Bioinform..

[12]  Matthew D. Dyer,et al.  Supervised learning and prediction of physical interactions between human and HIV proteins. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[13]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[14]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[15]  Jaime G. Carbonell,et al.  Multitask learning for host–pathogen protein interactions , 2013, Bioinform..

[16]  Alex Alves Freitas,et al.  Optimizing amino acid groupings for GPCR classification , 2008, Bioinform..

[17]  Bindu Nanduri,et al.  HPIDB 2.0: a curated database for host–pathogen interactions , 2016, Database J. Biol. Databases Curation.

[18]  Bingsheng He,et al.  ThunderSVM: A Fast SVM Library on GPUs and CPUs , 2018, J. Mach. Learn. Res..

[19]  L. Castagnoli,et al.  mentha: a resource for browsing integrated protein-interaction networks , 2013, Nature Methods.

[20]  Bingsheng He,et al.  Efficient Support Vector Machine Training Algorithm on GPUs , 2018, AAAI.

[21]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[22]  Austin Carpenter,et al.  CUSVM: A CUDA IMPLEMENTATION OF SUPPORT VECTOR CLASSIFICATION AND REGRESSION , 2009 .

[23]  Karin Breuer,et al.  InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation , 2012, Nucleic Acids Res..

[24]  Feng Ye,et al.  Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM , 2012, Journal of biomolecular structure & dynamics.

[25]  Xue-wen Chen,et al.  On Position-Specific Scoring Matrix for Protein Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[28]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[29]  Jiangning Song,et al.  Leveraging Stacked Denoising Autoencoder in Prediction of Pathogen-Host Protein-Protein Interactions , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).

[30]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[31]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[32]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[33]  Jie Tan,et al.  Big Data Bioinformatics , 2014, Journal of cellular physiology.

[34]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..