Classification of Autism Genes Using Network Science and Linear Genetic Programming

Understanding the genetic background of complex diseases and disorders plays an essential role in the promising precision medicine. Deciphering what genes are associated with a specific disease/disorder helps better diagnose and treat it, and may even prevent it if predicted accurately and acted on effectively at early stages. The evaluation of candidate disease-associated genes, however, requires time-consuming and expensive experiments given the large number of possibilities. Due to such challenges, computational methods have seen increasing applications in predicting gene-disease associations. Given the intertwined relationships of molecules in human cells, genes and their products can be considered to form a complex molecular interaction network. Such a network can be used to find candidate genes that share similar network properties with known disease-associated genes. In this research, we investigate autism spectrum disorders and propose a linear genetic programming algorithm for autism gene prediction using a human molecular interaction network and known autism-genes for training. We select an initial set of network properties as features and our LGP algorithm is able to find the most relevant features while evolving accurate predictive models. Our research demonstrates the powerful and flexible learning abilities of GP on tackling a significant biomedical problem, and is expected to inspire further exploration of wide GP applications.

[1]  Yi Mei,et al.  Genetic programming for production scheduling: a survey with a unified framework , 2017, Complex & Intelligent Systems.

[2]  A. Barabasi,et al.  Uncovering disease-disease relationships through the incomplete interactome , 2015, Science.

[3]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[4]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[5]  Nikhil R. Pal,et al.  A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification , 2016, IEEE Transactions on Cybernetics.

[6]  Barbara Di Ventura,et al.  From in vivo to in silico biology and back , 2006, Nature.

[7]  Ting Hu,et al.  An information-gain approach to detecting three-way epistatic interactions in genetic association studies , 2013, J. Am. Medical Informatics Assoc..

[8]  J. Nadeau,et al.  Finding Genes That Underlie Complex Traits , 2002, Science.

[9]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[10]  Faramarz Dorani,et al.  Ensemble learning for detecting gene-gene interactions in colorectal cancer , 2018, PeerJ.

[11]  R. Myers,et al.  Candidate-gene approaches for studying complex genetic traits: practical considerations , 2002, Nature Reviews Genetics.

[12]  Wolfgang Banzhaf,et al.  A comparison of linear genetic programming and neural networks in medical data mining , 2001, IEEE Trans. Evol. Comput..

[13]  Natasa Przulj,et al.  Predicting disease associations via biological network analysis , 2014, BMC Bioinformatics.

[14]  Aytac Guven,et al.  Linear genetic programming for time-series modelling of daily flow rate , 2009 .

[15]  Ting Hu,et al.  Measuring the importance of vertices in the weighted human disease network , 2018, bioRxiv.

[16]  J. Ott,et al.  Neural networks and disease association studies. , 2001, American journal of medical genetics.

[17]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[18]  Ting Hu,et al.  Fault Detection and Classification for Induction Motors Using Genetic Programming , 2019, EuroGP.

[19]  A. Barabasi,et al.  Human disease classification in the postgenomic era: A complex systems approach to human pathobiology , 2007, Molecular systems biology.

[20]  Alexandros Agapitos,et al.  Adaptive Distance Metrics for Nearest Neighbour Classification Based on Genetic Programming , 2013, EuroGP.

[21]  C. Lord,et al.  The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors , 2010, Neuron.

[22]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[23]  Shu-Heng Chen,et al.  Evolving traders and the business school with genetic programming: A new architecture of the agent-based artificial stock market , 2001 .

[24]  Sharmila Banerjee-Basu,et al.  SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs) , 2013, Molecular Autism.

[25]  A. Griffiths Introduction to Genetic Analysis , 1976 .

[26]  Ting Hu,et al.  Characterizing genetic interactions in human disease association studies using statistical epistasis networks , 2011, BMC Bioinformatics.

[27]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[28]  Alison M. Goate,et al.  The Candidate Gene Approach , 2000, Alcohol research & health : the journal of the National Institute on Alcohol Abuse and Alcoholism.

[29]  Shuhong Zhao,et al.  Candidate Gene Identification Approach: Progress and Challenges , 2007, International journal of biological sciences.

[30]  Asoke K. Nandi,et al.  Feature generation using genetic programming with application to fault classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  Ajith Abraham,et al.  Web usage mining using artificial ant colony clustering and linear genetic programming , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[32]  Fredric C. Gey,et al.  The relationship between recall and precision , 1994 .

[33]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[34]  Asoke K. Nandi,et al.  Genetic programming techniques for hand written digit recognition , 2004, Signal Process..

[35]  Yuanfang Guan,et al.  Brain-specific functional relationship networks inform autism spectrum disorder gene prediction , 2018, Translational Psychiatry.

[36]  Ting Hu,et al.  A network approach to prioritizing susceptibility genes for genome‐wide association studies , 2019, Genetic epidemiology.

[37]  Alex Alves Freitas,et al.  Contrasting meta-learning and hyper-heuristic research: the role of evolutionary algorithms , 2013, Genetic Programming and Evolvable Machines.

[38]  Ting Hu,et al.  Analyzing Feature Importance for Metabolomics Using Genetic Programming , 2018, EuroGP.

[39]  et al,et al.  Application of Genetic Programming to High Energy Physics Event Selection , 2005, hep-ex/0503007.

[40]  Marco Tomassini,et al.  Complex Network Analysis of a Genetic Programming Phenotype Network , 2019, EuroGP.

[41]  David F. Gleich,et al.  PageRank beyond the Web , 2014, SIAM Rev..

[42]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[43]  Marcin Witczak,et al.  Genetic programming based approaches to identification and fault diagnosis of non-linear dynamic systems , 2002 .

[44]  Chun-Gui Xu,et al.  A genetic programming-based approach to the classification of multiclass microarray datasets , 2009, Bioinform..

[45]  Albert-László Barabási,et al.  A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome , 2015, PLoS Comput. Biol..

[46]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[47]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[48]  Ting Hu,et al.  An evolutionary learning and network approach to identifying key metabolites for osteoarthritis , 2018, PLoS Comput. Biol..

[49]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.