Complex detection from PPI data using ensemble method

Many algorithms have been proposed recently to detect protein complexes in protein–protein interaction (PPI) networks. Most proteins form complexes to accomplish biological functions such as transcription of DNA, translation of mRNA and cell growth. Since proteins perform their tasks by interacting with each other, determining these protein–protein interactions is an important task. Traditional clustering approaches for protein complex identification cannot deal with noisy and incomplete PPI data and dependent on information from a single source. Since the noise in the interaction datasets hampers the detection of accurate protein complexes, we propose an ensemble approach for protein complex detection that attempts to combine information from Gene Ontology at the time of complex detection. The PPI data network is taken as input by several baseline complex detection algorithms to generate protein complexes. The protein complexes are then subsequently combined by the proposed ensemble using a consensus building module for the purpose of identifying meaningful complexes. The protein complexes thus predicted by the ensemble are evaluated by comparing them to a set of gold standard protein complexes and their biological relevance established using a co-localization score.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, RECOMB.

[3]  DaiDao-Qing,et al.  Detecting Protein Complexes from Signed Protein-Protein Interaction Networks , 2015 .

[4]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[5]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[6]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[7]  Wolfgang Link,et al.  Protein localization in disease and therapy , 2011, Journal of Cell Science.

[8]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[9]  Hon Wai Leong,et al.  A survey of computational methods for protein complex prediction from protein interaction networks , 2012, J. Bioinform. Comput. Biol..

[10]  Yi Pan,et al.  Identification of protein complexes from multi-relationship protein interaction networks , 2016, Human Genomics.

[11]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[12]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[13]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[14]  Caroline C. Friedel,et al.  ProCope - protein complex prediction and evaluation , 2008, Bioinform..

[15]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[16]  Derek Greene,et al.  Ensemble non-negative matrix factorization methods for clustering protein-protein interactions , 2008, Bioinform..

[17]  Le Ou-Yang,et al.  Protein Complex Detection via Weighted Ensemble Clustering Based on Bayesian Nonnegative Matrix Factorization , 2013, PloS one.

[18]  Xiaodong Duan,et al.  EnPC: An Ensemble Clustering Framework for Detecting Protein Complexes in Protein-Protein Interaction Network , 2016 .

[19]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[20]  Jie Zheng,et al.  Identifying protein complexes from heterogeneous biological data , 2013, Proteins.

[21]  Swarup Roy,et al.  Unsupervised methods for finding protein complexes from PPI networks , 2015, Network Modeling Analysis in Health Informatics and Bioinformatics.

[22]  Shoshana J. Wodak,et al.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs , 2009, BMC Bioinformatics.

[23]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[24]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[25]  Xiaohui Xie,et al.  Inference of the Xenopus tropicalis embryonic regulatory network and spatial gene expression patterns , 2014, BMC Systems Biology.

[26]  Yu-Ping Wang,et al.  MicroRNA–mRNA interaction analysis to detect potential dysregulation in complex diseases , 2014, Network Modeling Analysis in Health Informatics and Bioinformatics.

[27]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[28]  S. Yokoyama,et al.  Effects of Mutations of ABCA1 in the First Extracellular Domain on Subcellular Trafficking and ATP Binding/Hydrolysis* 210 , 2003, The Journal of Biological Chemistry.

[29]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[30]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[31]  Hon Wai Leong,et al.  Employing functional interactions for characterization and detection of sparse complexes from yeast PPI networks , 2012, Int. J. Bioinform. Res. Appl..

[32]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[33]  Nazar Zaki Multi-protein Complex Detection by Integrating Network Topological Features and Biological Process Information , 2014 .

[34]  Min Wu,et al.  Protein Complex Detection via Effective Integration of Base Clustering Solutions and Co-Complex Affinity Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[36]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[37]  Dhruba Kumar Bhattacharyya,et al.  Cluster analysis of cancer data using semantic similarity, sequence similarity and biological measures , 2014, Network Modeling Analysis in Health Informatics and Bioinformatics.

[38]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[39]  Dhruba Kumar Bhattacharyya,et al.  Classification of microarray cancer data using ensemble approach , 2013, Network Modeling Analysis in Health Informatics and Bioinformatics.

[40]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[41]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[42]  Yuan Zhang,et al.  A graph-based cluster ensemble method to detect protein functional modules from multiple information sources , 2012, BCB.

[43]  R. Klausner,et al.  Biochemical Characterization of the Wilson Disease Protein and Functional Expression in the Yeast Saccharomyces cerevisiae * , 1997, The Journal of Biological Chemistry.

[44]  J. Gitlin,et al.  Functional expression of the Wilson disease protein reveals mislocalization and impaired copper-dependent trafficking of the common H1069Q mutation. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[45]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[46]  L. Wong,et al.  Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes , 2015, FEBS letters.

[47]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[48]  Limsoon Wong,et al.  Discovery of small protein complexes from PPI networks with size-specific supervised weighting , 2014, BMC Systems Biology.

[49]  Xiao-Fei Zhang,et al.  Detecting Protein Complexes from Signed Protein-Protein Interaction Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[50]  Manish Kumar Gupta,et al.  Binding affinity analysis and ADMET prediction of epigallocatechine gallate (EGCG) derivatives for AP-1 protein: a drug target for liver cancer , 2014, Network Modeling Analysis in Health Informatics and Bioinformatics.

[51]  Guimei Liu,et al.  Supervised maximum-likelihood weighting of composite protein networks for complex prediction , 2012, BMC Systems Biology.