Ortho_Sim_Loc: Essential protein prediction using orthology and priority-based similarity approach

Proteins are the essential macro-molecules of living organism. But all proteins cannot be considered as essential in different relevant studies. Essentiality of a protein is thus computed by computation methods rather than biological experiments which in turn save both time and effort. Different computational approaches are already predicted to select essential proteins successfully with different biological significances by researchers. Most of the experimental approaches return higher false negative outcomes with respect to others. In order to retain the prediction accuracy level, a novel methodology "Ortho_Sim_Loc"has been proposed which is a combined approach of Orthology, Similarity (using clustering and priority based GO-Annotation) and Subcellular localization. Ortho_Sim_Loc can predict enriched functional set essential proteins. The predicted results are validated with other existing methods like different centrality measures, LIDC. The validation results exhibits better performance of Ortho_Sim_Loc in compare to other existing computational approaches.

[1]  Debangshu Dey,et al.  Integration of morphological preprocessing and fractal based feature extraction with recursive feature elimination for skin lesion types classification , 2019, Comput. Methods Programs Biomed..

[2]  Stefan Wuchty,et al.  Controllability in protein interaction networks , 2014, Proceedings of the National Academy of Sciences.

[3]  Chao Qin,et al.  A New Method for Identifying Essential Proteins Based on Network Topology Properties and Protein Complexes , 2016, PloS one.

[4]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[5]  Yi Pan,et al.  A local average connectivity-based method for identifying essential proteins from the network level , 2011, Comput. Biol. Chem..

[6]  Yi Pan,et al.  Predicting essential proteins based on subcellular localization, orthology and PPI networks , 2016, BMC Bioinformatics.

[7]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[8]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[9]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Divya Mistry,et al.  DiffSLC: A graph centrality method to detect essential proteins of a protein-protein interaction network , 2017, PloS one.

[11]  A. Bolhassani,et al.  Physicochemical properties of polymers: An important system to overcome the cell barriers in gene transfection , 2015, Biopolymers.

[12]  A. Barabasi,et al.  Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets , 2015, Proceedings of the National Academy of Sciences.

[13]  S. O’Brien,et al.  Evaluation and Integration of Genetic Signature for Prediction Risk of Nasopharyngeal Carcinoma in Southern China , 2014, BioMed research international.

[14]  M. Yousef,et al.  Sequence-based information-theoretic features for gene essentiality prediction , 2017, BMC Bioinformatics.

[15]  Linqiang Pan,et al.  Identifying Driver Nodes in the Human Signaling Network Using Structural Controllability Analysis , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Anupam Ghosh,et al.  Mutual Information -The Biomarker of Essential Gene Predictions in Gene-Gene-Interaction of Lung Cancer , 2018, CICBA.

[17]  Rajat K. De,et al.  Linguistic recognition system for identification of some possible genes mediating the development of lung adenocarcinoma , 2009, Inf. Fusion.

[18]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[19]  Anupam Ghosh,et al.  Identifying essential proteins using modified-monkey algorithm (MMA) , 2020, Comput. Biol. Chem..

[20]  Xiujuan Lei,et al.  A new method for predicting essential proteins based on participation degree in protein complex and subgraph density , 2018, PloS one.

[21]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[22]  Xin Xu,et al.  Efficient Classification of Hot Spots and Hub Protein Interfaces by Recursive Feature Elimination and Gradient Boosting , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  E. Sonnhammer,et al.  Global networks of functional coupling in eukaryotes from comprehensive data integration. , 2009, Genome research.

[24]  Yan Lin,et al.  DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes , 2008, Nucleic Acids Res..

[25]  Minzhu Xie,et al.  XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction , 2018, IEEE Transactions on NanoBioscience.

[26]  Xiaohua Hu,et al.  Prediction of essential proteins based on subcellular localization and gene expression correlation , 2017, BMC Bioinformatics.

[27]  Eduardo Tejera,et al.  Co-expression network analysis and genetic algorithms for gene prioritization in preeclampsia , 2013, BMC Medical Genomics.

[28]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[29]  Xiao-Fei Zhang,et al.  Determining minimum set of driver nodes in protein-protein interaction networks , 2015, BMC Bioinformatics.

[30]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[31]  Jonathan M. Garibaldi,et al.  Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data , 2012, PloS one.

[32]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[33]  Jie Zhao,et al.  Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets , 2018, Knowl. Based Syst..

[34]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[35]  Pabitra Mitra,et al.  Controllability of Network: Identification of Controller Genes in a Gene–Gene Interaction Network , 2018, Advances in Intelligent Systems and Computing.

[36]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[37]  Jun Ren,et al.  Discovering essential proteins based on PPI network and protein complex , 2015, Int. J. Data Min. Bioinform..

[38]  Anjan Kumar Payra FUNCTION PREDICTION USING CLUSTER ANALYSIS OF UNANNOTATED ALIGN SEQUENCES , 2013 .

[39]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[40]  Mark Gerstein,et al.  The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics , 2007, PLoS Comput. Biol..

[41]  Albert-László Barabási,et al.  Control Centrality and Hierarchical Structure in Complex Networks , 2012, PloS one.

[42]  Tatsuya Akutsu,et al.  Analysis on controlling complex networks based on dominating sets , 2013 .

[43]  Bart De Moor,et al.  Candidate gene prioritization by network analysis of differential expression using machine learning approaches , 2010, BMC Bioinformatics.

[44]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[45]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[47]  Yi Pan,et al.  CytoNCA: A cytoscape plugin for centrality analysis and evaluation of protein interaction networks , 2015, Biosyst..

[48]  Jianxin Wang,et al.  Identifying Hierarchical and Overlapping Protein Complexes Based on Essential Protein-Protein Interactions and “Seed-Expanding” Method , 2014, BioMed research international.

[49]  Annamaria Conte,et al.  A New Weighted Degree Centrality Measure: The Application in an Animal Disease Epidemic , 2016, PloS one.