DiffSLC: A graph centrality method to detect essential proteins of a protein-protein interaction network

Identification of central genes and proteins in biomolecular networks provides credible candidates for pathway analysis, functional analysis, and essentiality prediction. The DiffSLC centrality measure predicts central and essential genes and proteins using a protein-protein interaction network. Network centrality measures prioritize nodes and edges based on their importance to the network topology. These measures helped identify critical genes and proteins in biomolecular networks. The proposed centrality measure, DiffSLC, combines the number of interactions of a protein and the gene coexpression values of genes from which those proteins were translated, as a weighting factor to bias the identification of essential proteins in a protein interaction network. Potentially essential proteins with low node degree are promoted through eigenvector centrality. Thus, the gene coexpression values are used in conjunction with the eigenvector of the network’s adjacency matrix and edge clustering coefficient to improve essentiality prediction. The outcome of this prediction is shown using three variations: (1) inclusion or exclusion of gene co-expression data, (2) impact of different coexpression measures, and (3) impact of different gene expression data sets. For a total of seven networks, DiffSLC is compared to other centrality measures using Saccharomyces cerevisiae protein interaction networks and gene expression data. Comparisons are also performed for the top ranked proteins against the known essential genes from the Saccharomyces Gene Deletion Project, which show that DiffSLC detects more essential proteins and has a higher area under the ROC curve than other compared methods. This makes DiffSLC a stronger alternative to other centrality methods for detecting essential genes using a protein-protein interaction network that obeys centrality-lethality principle. DiffSLC is implemented using the igraph package in R, and networkx package in Python. The python package can be obtained from git.io/diffslcpy. The R implementation and code to reproduce the analysis is available via git.io/diffslc.

[1]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[2]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[3]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[4]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[5]  K. Clayton,et al.  Transactions of the Institute of British Geographers , 1959 .

[6]  Yan Lin,et al.  DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements , 2013, Nucleic Acids Res..

[7]  Jamie Snider,et al.  Interactive proteomics research technologies: recent applications and advances. , 2011, Current opinion in biotechnology.

[8]  Karthik Raman,et al.  The organisational structure of protein networks: revisiting the centrality–lethality hypothesis , 2013, Systems and Synthetic Biology.

[9]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[10]  Jianzhi Zhang,et al.  Why Do Hubs Tend to Be Essential in Protein Networks? , 2006, PLoS genetics.

[11]  Mark E. J. Newman A measure of betweenness centrality based on random walks , 2005, Soc. Networks.

[12]  Jianxin Wang,et al.  Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. , 2014, Methods.

[13]  P. Gould THE GEOGRAPHICAL INTERPRETATION OF EIGENVALUES , 1967 .

[14]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  James U. Bowie,et al.  Network rewiring is an important mechanism of gene essentiality change , 2012, Scientific Reports.

[16]  Yi Pan,et al.  Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks , 2012, BMC Systems Biology.

[17]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  Sara Ballouz,et al.  Guidance for RNA-seq co-expression network construction and analysis: safety in numbers , 2015, Bioinform..

[20]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Roy Parker,et al.  Nonsense-mediated mRNA decay: terminating erroneous gene expression. , 2004, Current opinion in cell biology.

[22]  Ramesh Ram,et al.  Constraint Minimization for Efficient Modeling of Gene Regulatory Network , 2008, PRIB.

[23]  Cristian Del Fabbro,et al.  Comparative study of RNA-seq- and Microarray-derived coexpression networks in Arabidopsis thaliana , 2013, Bioinform..

[24]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[25]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Yi Pan,et al.  Identifying essential proteins from active PPI networks constructed with dynamic gene expression , 2015, BMC Genomics.

[27]  Yi Pan,et al.  Predicting Essential Proteins Based on Weighted Degree Centrality , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Mathias Uhlén,et al.  Affinity as a tool in life science. , 2008, BioTechniques.

[29]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[30]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[31]  Ney Lemke,et al.  Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review , 2016, Front. Physiol..

[32]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[33]  Ren Zhang,et al.  DEG: a database of essential genes. , 2004, Nucleic acids research.

[34]  Wangxin Xiao,et al.  An ensemble framework for identifying essential proteins , 2016, BMC Bioinformatics.

[35]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[36]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[37]  B. Yandell,et al.  Impact of Nonsense-Mediated mRNA Decay on the Global Expression Profile of Budding Yeast , 2006, PLoS genetics.

[38]  Stefan Burr,et al.  The Mathematics of networks , 1982 .

[39]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[40]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[41]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[42]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[43]  Heiko Rieger,et al.  Random walks on complex networks. , 2004, Physical review letters.

[44]  Gert Sabidussi,et al.  The centrality index of a graph , 1966 .

[45]  Matthew W. Hahn,et al.  Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. , 2005, Molecular biology and evolution.

[46]  Chao Qin,et al.  A New Method for Identifying Essential Proteins Based on Network Topology Properties and Protein Complexes , 2016, PloS one.

[47]  Yixing Han,et al.  Advanced Applications of RNA Sequencing and Challenges , 2015, Bioinformatics and biology insights.

[48]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[49]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[50]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[51]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[52]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[53]  K. Young Yeast two-hybrid: so many interactions, (in) so little time... , 1998, Biology of reproduction.

[54]  Chris Cornelis,et al.  Modeling Protein Interaction Networks with Answer Set Programming , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[55]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[56]  Ernesto Estrada Virtual identification of essential proteins within the protein interaction network of yeast , 2005, Proteomics.

[57]  Daniel Bottomly,et al.  Utilizing RNA-Seq data for de novo coexpression network inference , 2012, Bioinform..

[58]  Roland Eils,et al.  Identifying essential genes in bacterial metabolic networks with machine learning methods , 2010, BMC Systems Biology.