Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine

Receptor tyrosine kinases are essential proteins involved in cellular differentiation and proliferation in vivo and are heavily involved in allergic diseases, diabetes, and onset/proliferation of cancerous cells. Identifying the interacting partner of this protein, a growth factor ligand, will provide a deeper understanding of cellular proliferation/differentiation and other cell processes. In this study, we developed a method for predicting tyrosine kinase ligand-receptor pairs from their amino acid sequences. We collected tyrosine kinase ligand-receptor pairs from the Database of Interacting Proteins (DIP) and UniProtKB, filtered them by removing sequence redundancy, and used them as a dataset for machine learning and assessment of predictive performance. Our prediction method is based on support vector machines (SVMs), and we evaluated several input features suitable for tyrosine kinase for machine learning and compared and analyzed the results. Using sequence pattern information and domain information extracted from sequences as input features, we obtained 0.996 of the area under the receiver operating characteristic curve. This accuracy is higher than that obtained from general protein-protein interaction pair predictions.

[1]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[2]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[3]  Joseph Schlessinger,et al.  Signal transduction by receptors with tyrosine kinase activity , 1990, Cell.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Tingting Li,et al.  Identifying Human Kinase-Specific Protein Phosphorylation Sites by Integrating Heterogeneous Information from Various Sources , 2010, PloS one.

[7]  L. Johnson,et al.  Protein Kinase Inhibitors: Insights into Drug Design from Structure , 2004, Science.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Bart De Moor,et al.  Predicting receptor-ligand pairs through kernel learning , 2011, BMC Bioinformatics.

[10]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[11]  J. Schlessinger Cell Signaling by Receptor Tyrosine Kinases , 2000, Cell.

[12]  C. Marshall,et al.  Specificity of receptor tyrosine kinase signaling: Transient versus sustained extracellular signal-regulated kinase activation , 1995, Cell.

[13]  I. Rebay Keeping the receptor tyrosine kinase signaling pathway in check: lessons from Drosophila. , 2002, Developmental biology.

[14]  Narmada Thanki,et al.  CDD: conserved domains and protein three-dimensional structure , 2012, Nucleic Acids Res..

[15]  Ao Li,et al.  Improving the performance of protein kinase identification via high dimensional protein-protein interactions and substrate structure data. , 2014, Molecular bioSystems.

[16]  Yi Shen,et al.  Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest , 2014, Amino Acids.

[17]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[18]  Bonnie Berger,et al.  Struct2Net: a web service to predict protein–protein interactions using a structure-based approach , 2010, Nucleic Acids Res..