Essential protein identification based on essential protein-protein interaction prediction by integrated edge weights

Essential proteins are crucial to cellular survival and development. Traditionally, essential proteins are identified by knock-out experiments, which are expensive and often fatal to the target organisms. Regarding this, an important approach to essential protein identification is through computational prediction. In this research, we present a novel computational method, Integrated Edge Weights (IEW), to innovatively predict proteins' essentiality based on essential protein-protein interactions. The experimental results on all three organisms: Saccharomyces cere-visiae (Yeast), Escherichia coli (E. coli), and Caenorhabditis ele-gans (C. elegans) show that IEW achieves better performance than the state-of-the-art methods in terms of precision-recall. Furthermore, we have demonstrated that the highly-ranked protein-protein interactions predicted by our approach tend to be biologically significant in Yeast, E. coli, and C. elegans proteinprotein interaction (PPI) networks.

[1]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[2]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[3]  L. Freedman,et al.  Autogenous control of the S10 ribosomal protein operon of Escherichia coli: genetic dissection of transcriptional and posttranscriptional regulation. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[5]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[6]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[7]  D. Vitkup,et al.  Predicting genes for orphan metabolic activities using phylogenetic profiles , 2006, Genome Biology.

[8]  John G. White,et al.  The dynactin complex is required for cleavage plane specification in early Caenorhabditis elegans embryos , 1998, Current Biology.

[9]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[10]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[11]  Peer Bork,et al.  OGEE: an online gene essentiality database , 2011, Nucleic Acids Res..

[12]  Jianzhi Zhang,et al.  Why Do Hubs Tend to Be Essential in Protein Networks? , 2006, PLoS genetics.

[13]  Sailu Yellaboina,et al.  DOMINE: a comprehensive collection of known and predicted domain-domain interactions , 2010, Nucleic Acids Res..

[14]  N. Ban,et al.  L23 protein functions as a chaperone docking site on the ribosome , 2002, Nature.

[15]  M. Snyder,et al.  A genomic study of the bipolar bud site selection pattern in Saccharomyces cerevisiae. , 2001, Molecular biology of the cell.

[16]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[18]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.