A fast algorithm to identify coevolutionary patterns from protein sequences based on tree-based data structure

Knowing how proteins interact with each other are crucial to for us to understand the functional mechanisms of proteins. It is for this reason that the CoFex has been developed in attempts to predict protein-protein interactions (PPIs) computationally. However, the procedure of obtaining coevolutionary patterns adopted by CoFex is inefficient especially for large-scale prediction of PPIs, as it needs to traverse the entire protein sequence dataset once when computing the number of co-occurrences for each candidate of coevolutionary patterns. Hence, to improve the efficiency of CoFex, we propose a novel tree-based data structure, namely CF-Tree, to integrate with CoFex so that the running time of CoFex can be reduced by only traversing the sequence dataset once. The experiment results show that CF-Tree is a promising tree-based data structure to identify the coevolutionary patterns more efficiently from the sequence information of proteins in different species.

[1]  Keith C. C. Chan,et al.  Discovering Variable-Length Patterns in Protein Sequences for Protein-Protein Interaction Prediction , 2015, IEEE Transactions on NanoBioscience.

[2]  Minghua Deng,et al.  Inferring Domain–Domain Interactions From Protein–Protein Interactions , 2002 .

[3]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[4]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[5]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[6]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[7]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[8]  Xiaohui Yuan,et al.  Efficiently predicting large-scale protein-protein interactions using MapReduce , 2017, Comput. Biol. Chem..

[9]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[10]  Yen-Han Lin,et al.  Prediction of Protein-Protein Interactions Using Protein Signature Profiling , 2007, Genom. Proteom. Bioinform..

[11]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[12]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[13]  Stefan Wuchty,et al.  Inferring protein-protein interactions from multiple protein domain combinations. , 2009, Methods in molecular biology.

[14]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[15]  Mark A. Ragan,et al.  Gene Ontology-driven inference of protein-protein interactions using inducers , 2011 .

[16]  Keith C. C. Chan,et al.  Extracting Coevolutionary Features from Protein Sequences for Predicting Protein-Protein Interactions , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[18]  Reza Ebrahimpour,et al.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. , 2013, Genomics.

[19]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[20]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[21]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[23]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.