An Artificial Fish Swarm Algorithm for Identifying Associations between Multiple Variants and Multiple Phenotypes

Identifying associations between genomic variants and phenotypes has always been an interesting research field of population genetics, which is of great significance for studying the pathogenesis of complex diseases and supporting clinical assistant decision making. Nowadays, many identification methods have been proposed to find the associations between variants and phenotypes, such as GWAS and pheWAS, and have made excellent achievements in pathological research and clinical practice. However, the existing methods only focus on single phenotype-multiple variants or single variant-multiple phenotypes, but not on multiple variants-multiple phenotypes. In the view of the fact that complex diseases often have several subtypes which differ greatly in variants and phenotypes, focusing only on single variant or single phenotype is far from enough and limits the ability of identification of those methods. Therefore, we propose a heuristic method with an AFSA framework on the solution space to identify associations between multiple variants and multiple phenotypes. In our method, each fish carries two logic trees that respectively represent the associations between variants and the associations between phenotypes. The logic trees will be iteratively updated to find a better solution according to the preset update strategies. When the iteration stop condition is reached, the algorithm will stop and output the optimal fish. The logical expression represented by the logic trees carried by the optimal fish is the associations we find. We validated the proposed method on the simulation data generated by hapgen2 and PhenotypeSimulator, and took the ratio of the number of people that can be explained by the found logical expression as the index to evaluate the performance, which was called Coverage. We conducted 9 groups of experiments, each of which was different in the number of variants and phenotypes. The best Coverage of was from the group including 500 variants and 10 phenotypes, which reached 72.12%, and the worst result is from the group including 100 variants and 20 phenotypes, 31.73%. We also exhausted the simulation data to find the optimal logical expression and several most important logic rules to evaluate the results obtained by the method.

[1]  Jin Zhang,et al.  Identifying interacting SNPs with parallel fish-agent based logic regression , 2011, 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).

[2]  Eleftheria Zeggini,et al.  Rare variant association analysis methods for complex traits. , 2010, Annual review of genetics.

[3]  Marylyn D Ritchie,et al.  Comparison of approaches for machine‐learning optimization of neural networks for detecting gene‐gene interactions in genetic epidemiology , 2008, Genetic epidemiology.

[4]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[5]  Zhongmeng Zhao,et al.  Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression , 2013, BioMed research international.

[6]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[7]  Taryn O. Hall,et al.  Risk prediction for complex diseases: application to Parkinson disease , 2012, Genetics in Medicine.

[8]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.

[9]  E. Lander,et al.  On the allelic spectrum of human disease. , 2001, Trends in genetics : TIG.

[10]  A. Hofman,et al.  Variant of TREM2 associated with the risk of Alzheimer's disease. , 2013, The New England journal of medicine.

[11]  M. Wagner Rare-variant genome-wide association studies: a new frontier in genetic analysis of complex traits. , 2013, Pharmacogenomics.

[12]  Mahul B. Amin,et al.  Prognostic Impact of Histologic Subtyping of Adult Renal Epithelial Neoplasms: An Experience of 405 Cases , 2002, The American journal of surgical pathology.

[13]  C. Gieger,et al.  Genomewide association analysis of coronary artery disease. , 2007, The New England journal of medicine.

[14]  Jing He,et al.  Gene-based interaction analysis by incorporating external linkage disequilibrium information , 2010, European Journal of Human Genetics.

[15]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[16]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[17]  Simon M Lin,et al.  Opportunities for drug repositioning from phenome-wide association studies , 2015, Nature Biotechnology.

[18]  Li Xiao,et al.  An Optimizing Method Based on Autonomous Animats: Fish-swarm Algorithm , 2002 .

[19]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[20]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[21]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[22]  M. LeBlanc,et al.  Logic Regression , 2003 .

[23]  Kari Stefansson,et al.  A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer , 2012, Nature Genetics.

[24]  A. Gylfason,et al.  Mutations in BRIP1 confer high risk of ovarian cancer , 2011, Nature Genetics.

[25]  I. Sesterhenn,et al.  World health organization classifications of tumours. pathology and genetics of tumours of the urinary system and male genital organs , 2005 .

[26]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[27]  Aris Floratos,et al.  Pattern-based mining strategy to detect multi-locus association and gene × environment interaction , 2007, BMC proceedings.

[28]  Ewan Birney,et al.  PhenotypeSimulator: A comprehensive framework for simulating multi-trait, multi-locus genotype to phenotype relationships , 2018, Bioinform..

[29]  H. Stefánsson,et al.  Identification of low-frequency variants associated with gout and serum uric acid levels , 2011, Nature Genetics.

[30]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[31]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[32]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[33]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[34]  W. Bodmer,et al.  Common and rare variants in multifactorial susceptibility to common diseases , 2008, Nature Genetics.

[35]  F. Hu,et al.  A Common Genetic Variant Is Associated with Adult and Childhood Obesity , 2006, Science.

[36]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[37]  David B. Goldstein,et al.  Genome‐Wide Association Studies , 2010 .

[38]  M. McCarthy,et al.  A Powerful Approach to Sub-Phenotype Analysis in Population-Based Genetic Association Studies , 2009, Genetic epidemiology.