Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale

Logistic regression is the primary analysis tool for binary traits in genome‐wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia package OrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case–control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.

[1]  P. O’Reilly,et al.  Genome-wide association study identifies eight loci associated with blood pressure , 2009, Nature Genetics.

[2]  Andrew D. Johnson,et al.  Genome-wide association study of blood pressure and hypertension , 2009, Nature Genetics.

[3]  Ricardo Pong-Wong,et al.  Evaluating the contribution of genetic and familial shared environment to common disease using the UK Biobank , 2016, Nature Genetics.

[4]  S. Brennecke,et al.  The antihypertensive MTHFR gene polymorphism rs17367504-G is a possible novel protective locus for preeclampsia , 2016, Journal of hypertension.

[5]  Hua Zhou,et al.  Mendel: the Swiss army knife of genetic analysis programs , 2013, Bioinform..

[6]  Mary Brophy,et al.  Million Veteran Program: A mega-biobank to study genetic influences on health and disease. , 2016, Journal of clinical epidemiology.

[7]  He Gao,et al.  Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular risk , 2017, Nature Genetics.

[8]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[9]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[10]  A. Agresti Analysis of Ordinal Categorical Data: Agresti/Analysis , 2010 .

[11]  M. McCarthy,et al.  A Powerful Approach to Sub-Phenotype Analysis in Population-Based Genetic Association Studies , 2009, Genetic epidemiology.

[12]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[13]  E. Regan,et al.  Clinical and radiographic predictors of GOLD-unclassified smokers in the COPDGene study. , 2011, American journal of respiratory and critical care medicine.

[14]  Harry J de Koning,et al.  Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis , 2017, Nature Genetics.

[15]  Kevin L. Keys,et al.  OpenMendel: a cooperative programming project for statistical genetics , 2019, Human Genetics.

[16]  Nick C Fox,et al.  Gene-Wide Analysis Detects Two New Susceptibility Genes for Alzheimer's Disease , 2014, PLoS ONE.

[17]  Cathie Sudlow,et al.  Algorithms for the Capture and Adjudication of Prevalent and Incident Diabetes in UK Biobank , 2016, PloS one.

[18]  F. Harrell,et al.  Partial Proportional Odds Models for Ordinal Response Variables , 1990 .

[19]  Gad Abraham,et al.  FlashPCA2: principal component analysis of biobank-scale genotype datasets , 2016, bioRxiv.

[20]  N. Laird,et al.  A genome-wide association study identifies risk loci for spirometric measures among smokers of European and African ancestry , 2015, BMC Genetics.

[21]  F. Martinez,et al.  Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. , 2007, American journal of respiratory and critical care medicine.

[22]  K. Tsaneva-Atanasova,et al.  FKBP12 Activates the Cardiac Ryanodine Receptor Ca2+-Release Channel and Is Antagonised by FKBP12.6 , 2012, PloS one.

[23]  Jackson T. Wright,et al.  2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. , 2018, Journal of the American College of Cardiology.

[24]  Christian Gieger,et al.  Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits , 2018, Nature Genetics.

[25]  E. Regan,et al.  Genetic Epidemiology of COPD (COPDGene) Study Design , 2011, COPD.