GA-PPI-Net: A Genetic Algorithm for Community Detection in Protein-Protein Interaction Networks

Community detection has become an important research direction for data mining in complex networks. It aims to identify topological structures and discover patterns in complex networks, which presents an important problem of great significance. In this paper, we are interested in the detection of communities in the Protein-Protein or Gene-gene Interaction (PPI) networks. These networks represent a set of proteins or genes that collaborate at the same cellular function. The goal is to identify such semantic and topological communities from gene annotation sources such as Gene Ontology. We propose a Genetic Algorithm (GA) based approach to detect communities having different sizes from PPI networks. For this purpose, we introduce three specific components to the GA: a fitness function based on a similarity measure and the interaction value between proteins or genes, a solution for representing a community with dynamic size and a specific mutation operator. In the computational tests carried out in this work, the introduced algorithm achieved excellent results to detect existing or even new communities from PPI networks.

[1]  Judith A. Blake,et al.  Gene Ontology annotations: what they mean and where they come from , 2008, BMC Bioinformatics.

[2]  Clara Pizzuti,et al.  GA-Net: A Genetic Algorithm for Community Detection in Social Networks , 2008, PPSN.

[3]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Kevin G. Becker,et al.  BBID: the biological biochemical image database , 2000, Bioinform..

[6]  Rod K. Nibbe,et al.  Protein–protein interaction networks and subnetworks in the biology of disease , 2011, Wiley interdisciplinary reviews. Systems biology and medicine.

[7]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[8]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[9]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[10]  Luonan Chen,et al.  Quantitative function for community detection. , 2008 .

[11]  Haluk Bingol,et al.  Community Detection in Complex Networks Using Genetic Algorithms , 2006, 0711.0491.

[12]  Rohan Agrawal,et al.  Bi-Objective Community Detection (BOCD) in Networks Using Genetic Algorithm , 2011, IC3.

[13]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[14]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[15]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[16]  Angela D. Wilkins,et al.  Discovery of Functional and Disease Pathways by Community Detection in Protein-Protein Interaction Networks , 2017, PSB.

[17]  Clara Pizzuti,et al.  Evolutionary Computation for Community Detection in Networks: A Review , 2018, IEEE Transactions on Evolutionary Computation.

[18]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[21]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Maoguo Gong,et al.  A survey on network community detection based on evolutionary computation , 2016, Int. J. Bio Inspired Comput..

[24]  Bo Xu,et al.  Ontology integration to identify protein complex in protein interaction networks , 2010, BIBM.

[25]  Clara Pizzuti,et al.  A Multiobjective Genetic Algorithm to Find Communities in Complex Networks , 2012, IEEE Transactions on Evolutionary Computation.

[26]  Philip S. Yu,et al.  COMPARISON AND SELECTION OF OBJECTIVE FUNCTIONS IN MULTIOBJECTIVE COMMUNITY DETECTION , 2014, Comput. Intell..

[27]  Amel Borgi,et al.  Genetic Algorithm to Detect Different Sizes' Communities from Protein-Protein Interaction Networks , 2019, ICSOFT.

[28]  Xin Liu,et al.  Effective Algorithm for Detecting Community Structure in Complex Networks Based on GA and Clustering , 2007, International Conference on Computational Science.

[29]  David Liu,et al.  DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis , 2007, BMC Bioinformatics.

[30]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[31]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[32]  Jing Dong,et al.  Ontology Classification for Semantic-Web-Based Software Engineering , 2009, IEEE Transactions on Services Computing.

[33]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[34]  Alain Pétrowski,et al.  Evolutionary Algorithms , 2017 .

[35]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[36]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[37]  Luay Nakhleh,et al.  GS2: an efficiently computable measure of GO-based similarity of gene sets , 2009, Bioinform..

[38]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[39]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[40]  Philip S. Yu,et al.  On selection of objective functions in multi-objective community detection , 2011, CIKM '11.

[41]  Halife Kodaz,et al.  Community detection from biological and social networks: A comparative analysis of metaheuristic algorithms , 2017, Appl. Soft Comput..

[42]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[43]  Clara Pizzuti,et al.  A Multi-objective Genetic Algorithm for Community Detection in Networks , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[44]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[45]  Amel Borgi,et al.  Genetic Algorithm for Community Detection in Biological Networks , 2018, KES.

[46]  Clara Pizzuti,et al.  Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods , 2014, Bioinform..

[47]  P. Karp,et al.  Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers , 2005, Nucleic acids research.

[48]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[49]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[50]  Bin Wu,et al.  A multi-objective approach for community detection in complex network , 2010, IEEE Congress on Evolutionary Computation.