Multi-Dimensional Scaling based grouping of known complexes and intelligent protein complex detection

Protein-Protein Interactions (PPI) play a vital role in cellular processes and are formed because of thousands of interactions among proteins. Advancements in proteomics technologies have resulted in huge PPI datasets that need to be systematically analyzed. Protein complexes are the locally dense regions in PPI networks, which extend important role in metabolic pathways and gene regulation. In this work, a novel two-phase protein complex detection and grouping mechanism is proposed. In the first phase, topological and biological features are extracted for each complex, and prediction performance is investigated using Bagging based Ensemble classifier (PCD-BEns). Performance evaluation through cross validation shows improvement in comparison to CDIP, MCode, CFinder and PLSMC methods Second phase employs Multi-Dimensional Scaling (MDS) for the grouping of known complexes by exploring inter complex relations. It is experimentally observed that the combination of topological and biological features in the proposed approach has greatly enhanced prediction performance for protein complex detection, which may help to understand various biological processes, whereas application of MDS based exploration may assist in grouping potentially similar complexes.

[1]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[2]  Maozu Guo,et al.  A Least Square Method Based Model for Identifying Protein Complexes in Protein-Protein Interaction Network , 2014, BioMed research international.

[3]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[4]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[5]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[6]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[7]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[8]  Yi Pan,et al.  Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[9]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[11]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[12]  Thomas Rattei,et al.  SIMAP: the similarity matrix of proteins , 2006, Nucleic Acids Res..

[13]  Yanjun Qi,et al.  Protein complex identification by supervised graph local clustering , 2008, ISMB.

[14]  Marie Chabbert,et al.  Multidimensional Scaling Reveals the Main Evolutionary Pathways of Class A G-Protein-Coupled Receptors , 2011, PloS one.

[15]  Rosy Das Sarmah,et al.  Weighted edge based clustering to identify protein complexes in protein-protein interaction networks incorporating gene expression profile , 2016, Comput. Biol. Chem..

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Yi Pan,et al.  Towards the identification of protein complexes and functional modules by integrating PPI network and gene expression data , 2012, BMC Bioinformatics.

[18]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[20]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[21]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[22]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[23]  Xiaohui Yuan,et al.  Efficiently predicting large-scale protein-protein interactions using MapReduce , 2017, Comput. Biol. Chem..

[24]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[25]  Zia-ur-Rehman,et al.  Identifying GPCRs and their types with Chou's pseudo amino acid composition: an approach from multi-scale energy representation and position specific scoring matrix. , 2012, Protein and peptide letters.

[26]  Dmitrij Frishman,et al.  PEDANT genome database: 10 years online , 2006, Nucleic Acids Res..