Improved flower pollination algorithm for identifying essential proteins

BackgroundEssential proteins are necessary for the survival and development of cells. The identification of essential proteins can help to understand the minimal requirements for cellular life and it also plays an important role in the disease genes study and drug design. With the development of high-throughput techniques, a large amount of protein-protein interactions data is available to predict essential proteins at the network level. Hitherto, even though a number of essential protein discovery methods have been proposed, the prediction precision still needs to be improved.MethodsIn this paper, we propose a new algorithm, improved Flower Pollination algorithm (FPA) for identifying Essential proteins, named FPE. Different from other existing essential protein discovery methods, we apply FPA which is a new intelligent algorithm imitating pollination behavior of flowering plants in nature to identify essential proteins. Analogous to flower pollination is to find optimal reproduction from the perspective of biological evolution, and the identification of essential proteins is to discover a candidate essential protein set by analyzing the corresponding relationships between FPA algorithm and the prediction of essential proteins, and redefining the positions of flowers and specific pollination process. Moreover, it has been proved that the integration of biological and topological properties can get improved precision for identifying essential proteins. Consequently, we develop a GSC measurement in order to judge the essentiality of proteins, which takes into account not only the Gene expression data, Subcellular localization and protein Complexes information, but also the network topology.ResultsThe experimental results show that FPE performs better than the state-of-the-art methods (DC, SC, IC, EC, LAC, NC, PeC, WDC, UDoNC and SON) in terms of the prediction precision, precision-recall curve and jackknife curve for identifying essential proteins and also has high stability.ConclusionsWe confirm that FPE can be used to effectively identify essential proteins by the use of nature-inspired algorithm FPA and the combination of network topology with gene expression data, subcellular localization and protein complexes information. The experimental results have shown the superiority of FPE for the prediction of essential proteins.

[1]  Yi Pan,et al.  Identifying essential proteins from active PPI networks constructed with dynamic gene expression , 2015, BMC Genomics.

[2]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[3]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[4]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[5]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[6]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[7]  P. Stadler,et al.  Centers of complex networks. , 2003, Journal of theoretical biology.

[8]  Fang-Xiang Wu,et al.  United Complex Centrality for Identification of Essential Proteins from PPI Networks , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Yi Pan,et al.  A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data , 2012, BMC Systems Biology.

[10]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[11]  Xin-She Yang,et al.  Binary Flower Pollination Algorithm and Its Application to Feature Selection , 2015, Recent Advances in Swarm Intelligence and Evolutionary Computation.

[12]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[13]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[14]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[15]  Xin-She Yang,et al.  Multi-Objective Flower Algorithm for Optimization , 2014, ICCS.

[16]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[17]  Núria López-Bigas,et al.  Differences in the evolutionary history of disease genes affected by dominant or recessive mutations , 2006, BMC Genomics.

[18]  Yao Lu,et al.  Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus , 2014, Comput. Biol. Chem..

[19]  Rui Wang,et al.  Flower Pollination Algorithm with Bee Pollinator for cluster analysis , 2016, Inf. Process. Lett..

[20]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Ren Zhang,et al.  DEG: a database of essential genes. , 2004, Nucleic acids research.

[22]  H. Bussey,et al.  Large‐scale essential gene identification in Candida albicans and applications to antifungal drug discovery , 2003, Molecular microbiology.

[23]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[24]  Xin-She Yang,et al.  Flower Pollination Algorithm for Global Optimization , 2012, UCNC.

[25]  Jiawei Luo,et al.  Identification of Essential Proteins Based on a New Combination of Local Interaction Density and Protein Complexes , 2015, PloS one.

[26]  Yi Pan,et al.  Predicting essential proteins based on subcellular localization, orthology and PPI networks , 2016, BMC Bioinformatics.

[27]  Yi Pan,et al.  Identification of Essential Proteins Based on Edge Clustering Coefficient , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Yi Pan,et al.  UDoNC: An Algorithm for Identifying Essential Proteins Based on Protein Domains and Protein-Protein Interaction Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Yi Pan,et al.  A local average connectivity-based method for identifying essential proteins from the network level , 2011, Comput. Biol. Chem..

[30]  D. Ingber,et al.  High-Betweenness Proteins in the Yeast Protein Interaction Network , 2005, Journal of biomedicine & biotechnology.

[31]  Y. Dong,et al.  Systematic functional analysis of the Caenorhabditis elegans genome using RNAi , 2003, Nature.

[32]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[33]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Mark E. J. Newman A measure of betweenness centrality based on random walks , 2005, Soc. Networks.

[36]  Yi Pan,et al.  Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks , 2012, BMC Systems Biology.

[37]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[38]  Ney Lemke,et al.  Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information , 2009, BMC Bioinformatics.

[39]  Yi Pan,et al.  Predicting Essential Proteins Based on Weighted Degree Centrality , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[41]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[42]  Zhen Zhang,et al.  A Feature Selection Method for Prediction Essential Protein , 2015 .