Combining SVM and ECOC for Identification of Protein Complexes from Protein Protein Interaction Networks by Integrating Amino Acids’ Physical Properties and Complex Topology

Protein Complexes plays important role in key functional processes in cells by forming Protein Protein Interaction (PPI) networks. Conventionally, they were determined through experimental approaches. For the sake of saving time and cost reduction, many computational methods have been proposed. Fewer computational approaches take into account significant biological information contained within protein amino acid sequence and identified dense sub graphs as complexes from PPI network by considering density and degree statistics. Biological information evaluate the common features for performing a particular biological function among two proteins. Moreover, linear, star and hybrid sub graph structures may be found in PPI network so other topological features of graph are also important. In this article, support vector machine (SVM) in combination with Error-correcting output coding (ECOC) algorithm is utilized to construct an automatic detector for mining multiple protein complexes from PPI network, where amino acid physical properties i.e. kidera factors and a variety of topological constrains are employed as feature vectors. The overall success rates of protein complex identification achieved are 88.6% and 76.0% on MIPS benchmark set by considering DIP and Gavin interactions respectively. Support vector machine was an effective and solid approach for complex detection with amino acid’s physical properties and complex topology as dimensional vectors. Error-correcting output coding (ECOC) algorithm is a powerful algorithm for mining multiple protein complexes of small as well as large sizes. The accuracy of complex identification task based on amino acid’s physical and complex topological characteristics are strikingly increase when ECOC is integrated with SVM approach. Moreover, this paper implies that ECOC algorithm may succeed over a wide range of applications in biological data mining.

[1]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[2]  Antonino Fiannaca,et al.  A knowledge-based decision support system in bioinformatics: an application to protein complex extraction , 2013, BMC Bioinformatics.

[3]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[4]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[5]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[6]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[7]  Ran Su,et al.  Identifying N6-methyladenosine sites using multi-interval nucleotide pair position specificity and support vector machine , 2017, Scientific Reports.

[8]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[9]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[10]  G. P. Smith,et al.  Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. , 1985, Science.

[11]  K. Young Yeast two-hybrid: so many interactions, (in) so little time... , 1998, Biology of reproduction.

[12]  Zoe L. Jiang,et al.  Decision Tree Based Approaches for Detecting Protein Complex in Protein Protein Interaction Network (PPI) via Link and Sequence Analysis , 2018, IEEE Access.

[13]  Feng Yu,et al.  Predicting protein complex in protein interaction network - a supervised learning based method , 2014, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[14]  Uwe Schlattner,et al.  Yeast Two-Hybrid, a Powerful Tool for Systems Biology , 2009, International journal of molecular sciences.

[15]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[16]  Marek S Skrzypek,et al.  Using the Saccharomyces Genome Database (SGD) for analysis of genomic information. , 2011, Current protocols in bioinformatics.

[17]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[18]  Andrew Emili,et al.  Identifying functional modules in the physical interactome of Saccharomyces cerevisiae , 2007, Proteomics.

[19]  Fang-Xiang Wu,et al.  Identifying protein complexes and functional modules - from static PPI networks to dynamic PPI networks , 2014, Briefings Bioinform..

[20]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[21]  Yanjun Qi,et al.  Protein complex identification by supervised graph local clustering , 2008, ISMB.

[22]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[23]  Moataz A. Ahmed,et al.  Protein complexes predictions within protein interaction networks using genetic algorithms , 2016, BMC Bioinformatics.

[24]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[25]  Aisha Sikandar,et al.  Combining Sequence Entropy and Subgraph Topology for Complex Prediction in Protein Protein Interaction (PPI) Network , 2019 .

[26]  S. Thiagalingam,et al.  A cascade of modules of a network defines cancer progression. , 2006, Cancer research.

[27]  Xiujuan Lei,et al.  Protein complex detection with semi-supervised learning in protein interaction networks , 2011, Proteome Science.

[28]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[29]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[30]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[31]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of genome information in 2007 , 2007, Nucleic Acids Res..

[32]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[33]  S Rackovsky,et al.  Global characteristics of protein sequences and their implications , 2010, Proceedings of the National Academy of Sciences.

[34]  Xuan Wang,et al.  Complex Detection Based on Integrated Properties , 2011, ICONIP.