JS-MA: A Jensen-Shannon Divergence Based Method for Mapping Genome-Wide Associations on Multiple Diseases

Taking advantage of the high-throughput genotyping technology of Single Nucleotide Polymorphism (SNP), Genome-Wide Association Studies (GWASs) have been successfully implemented for defining the relative role of genes and the environment in disease risk, assisting in enabling preventative and precision medicine. However, current multi-locus-based methods are insufficient in terms of computational cost and discrimination power to detect statistically significant interactions with different genetic effects on multifarious diseases. Statistical tests for multi-locus interactions (≥2 SNPs) raise huge analytical challenges because computational cost increases exponentially as the growth of the cardinality of SNPs in an interaction module. In this paper, we develop a simple, fast, and powerful method, named JS-MA, based on Jensen-Shannon divergence and agglomerative hierarchical clustering, to detect the genome-wide multi-locus interactions associated with multiple diseases. From the systematical simulation, JS-MA is more powerful and efficient compared with the state-of-the-art association mapping tools. JS-MA was applied to the real GWAS datasets for two common diseases, i.e., Rheumatoid Arthritis and Type 1 Diabetes. The results showed that JS-MA not only confirmed recently reported, biologically meaningful associations, but also identified novel multi-locus interactions. Therefore, we believe that JS-MA is suitable and efficient for a full-scale analysis of multi-disease-related interactions in the large GWASs.

[1]  G. Rocheleau,et al.  A survey about methods dedicated to epistasis detection , 2015, Front. Genet..

[2]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.

[3]  Yang Liu,et al.  Genome-Wide Interaction-Based Association Analysis Identified Multiple New Susceptibility Loci for Common Diseases , 2011, PLoS genetics.

[4]  Qiang Yang,et al.  Detecting two-locus associations allowing for interactions in genome-wide association studies , 2010, Bioinform..

[5]  Jing Li,et al.  A novel strategy for detecting multiple loci in Genome-Wide Association Studies of complex diseases , 2008, Int. J. Bioinform. Res. Appl..

[6]  Yi Pan,et al.  Genome-Wide Interaction-Based Association of Human Diseases — A Survey , 2014 .

[7]  Yi Pan,et al.  DAM: A Bayesian Method for Detecting Genome-wide Associations on Multiple Diseases , 2015, ISBRA.

[8]  Xuan Guo,et al.  Searching Genome-wide Disease Association Through SNP Data , 2015 .

[9]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[10]  Yi Pan,et al.  Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering , 2014, BMC Bioinformatics.

[11]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[12]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[13]  Randy Goebel,et al.  Whole genome Identity-by-Descent determination , 2013, J. Bioinform. Comput. Biol..

[14]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[15]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[16]  C. Sing,et al.  A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. , 2001, Genome research.

[17]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[18]  Qianchuan He,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[19]  Yi Pan,et al.  Searching Genome-Wide Multi-Locus Associations for Multiple Diseases Based on Bayesian Inference , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[21]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[22]  Guimei Liu,et al.  An empirical comparison of several recent epistatic interaction detection methods , 2011, Bioinform..

[23]  Qiang Yang,et al.  Predictive rule inference for epistatic interaction detection in genome-wide association studies , 2010, Bioinform..

[24]  Jing Zhang,et al.  High-Order Interactions in Rheumatoid Arthritis Detected by Bayesian Method using Genome-Wide Association Studies Data , 2012 .

[25]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[26]  T. Quertermous,et al.  Replication of genome‐wide association signals of type 2 diabetes in Han Chinese in a prospective cohort , 2012, Clinical endocrinology.

[27]  Tao Jiang,et al.  Detecting genome-wide epistases based on the clustering of relatively frequent items , 2012, Bioinform..

[28]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[29]  Peter Kraft,et al.  Genetic risk prediction--are we there yet? , 2009, The New England journal of medicine.

[30]  Lusheng Wang,et al.  Fast accurate missing SNP genotype local imputation , 2012, BMC Research Notes.

[31]  Can Yang,et al.  GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies , 2011, Bioinform..

[32]  Jin Zhang,et al.  Methodological implementation of mixed linear models in multi-locus genome-wide association studies , 2017, Briefings in bioinformatics.

[33]  D. Clayton,et al.  PTPN22 Trp620 Explains the Association of Chromosome 1p13 With Type 1 Diabetes and Shows a Statistical Interaction With HLA Class II Genotypes , 2008, Diabetes.

[34]  Liyan Sun,et al.  SEE: a novel multi-objective evolutionary algorithm for identifying SNP epistasis in genome-wide association studies , 2019, Biotechnology & Biotechnological Equipment.