Title: Biological and clinical insights from genetics of insomnia symptoms Short Title: Insights from genetics of insomnia symptoms

Insomnia is a common disorder linked with adverse long-term medical and psychiatric outcomes, but underlying pathophysiological processes and causal relationships with disease are poorly understood. Here we identify 57 loci for self-reported insomnia symptoms in the UK Biobank (n=453,379) and confirm their impact on self-reported insomnia symptoms in the HUNT study (n=14,923 cases, 47,610 controls), physician diagnosed insomnia in Partners Biobank (n=2,217 cases, 14,240 controls), and accelerometer-derived measures of sleep efficiency and sleep duration in the UK Biobank (n=83,726). Our results suggest enrichment of genes involved in ubiquitin-mediated proteolysis, phototransduction and muscle development pathways and of genes expressed in multiple brain regions, skeletal muscle and adrenal gland. Evidence of shared genetic factors is found between frequent insomnia symptoms and restless legs syndrome, aging, cardio-metabolic, behavioral, psychiatric and reproductive traits. Evidence is found for a possible causal link between insomnia symptoms and coronary heart disease, depressive symptoms and subjective well-being. One Sentence Summary: We identify 57 genomic regions associated with insomnia pointing to the involvement of phototransduction and ubiquitination and potential causal links to CAD and depression. Insomnia disorder, defined by persistent difficulty in initiating or maintaining sleep, and corresponding daytime dysfunction, occurs in roughly 10-20% of the population, and leads to high socioeconomic costs and substantial lifetime morbidity 1 . Up to one-third of the population experience transient insomnia symptoms at any given time 2 . Longitudinal studies suggest that insomnia increases the risk for developing anxiety disorders, alcohol abuse, major depression and cardio-metabolic disease 3 . Despite its high prevalence and a hypothesized strong bidirectional link between insomnia and psychiatric disorders, little is known about underlying pathophysiologic mechanisms. Cognitive-behavioral therapies are the recommended first-line treatment approach but access is limited 4,5 . Common drug treatments target synaptic neurotransmission (via GABAergic pathways), cortical arousal (via histamine receptors), or the melatonin system, but these drugs have variable effectiveness, may be habit forming and have important side effects 6,7 . A better understanding of the etiology and pathophysiological processes would enable identification of new personalized therapeutic strategies for insomnia. Familybased heritability estimates suggest that insomnia has a genetic component (22%–25%) 8 . Recent GWAS in the first release of genetic data from the UK Biobank have reported four loci for insomnia symptoms (at MEIS1, TMEM132E, CYCL1 and SCFD2) 9,10 , but insights into underlying biological pathways and causal genetic links with disease are limited. In this study, we aimed to 1) discover novel genetic loci for self-reported insomnia symptoms using GWAS, 2) validate findings in independent clinical and population samples and in participants with activity monitor derived measures of sleep patterns and, 3) gain biological insights by gene, pathway and tissue-enrichment analyses and bioinformatic annotation, 4) investigate shared genetics with behavioral and disease traits, and 5) test for causal links between insomnia and relevant disease/traits. In UK Biobank participants of European ancestry (n=453,379), 29% self-reported frequent insomnia symptoms, with a higher prevalence in women than men (32% vs. 24%). Consistent with previous studies, insomnia symptoms were more prevalent in older participants, shift workers and those with shorter self-reported sleep duration (Supplementary Table 1). We performed two parallel GWAS in participants self-reporting insomnia symptoms 1) frequent insomnia symptoms: never/rarely vs. usually insomnia symptoms, n=129,270 2) any insomnia symptoms: never/rarely vs. sometimes/usually insomnia symptoms, n=345,022 cases using 14,661,600 genetic variants across the autosomes and X chromosome. We identified 57 association signals (Fig. 1, Supplementary Table 2, Supplementary Fig.1-2). Of these, 20 loci were identified in both analyses, 28 loci were identified only in analysis of frequent insomnia symptoms, and 9 only in analysis of any insomnia symptoms (Supplementary Table 2). Conditional analyses identified no secondary association signals. The 57 genetic associations were independent of established or putative insomnia risk factors, as sensitivity analyses adjusting for BMI, lifestyle, caffeine consumption, and depression or recent stress did not notably alter the magnitude or direction of effect estimates (Supplementary Table 3). The MEIS1 association signal identified in the interim release of the UK Biobank was confirmed in the remainder of the UK Biobank sample (excluding the interim subjects and relatives of interim subjects) (n=75,508 cases of frequent insomnia symptoms and 64,403 controls; rs113851554 T OR [95% CI] 1.19 [1.15-1.23, p=1.5 x10 -21 ), and nominal replication was seen for the previously reported CYCL1 signal (p=9.0 x10 -3 ). The TMEM132E and SCFD2 signals showed concordant direction of effect with the initial subsample, but were not significant, perhaps reflecting selection bias in the initial subsample, with a high respiratory disease burden from the BILEVE study 11 . No other findings from previous candidate gene association studies or smaller GWAS were confirmed (Supplementary Table 4). Secondary GWAS excluding current shift workers or individuals reporting hypnotic, antianxiolytic or psychiatric medication usage, and/or with selected chronic diseases or psychiatric illnesses, (excluding n= 76,470 participants) revealed strong pair-wise genetic correlation to the primary GWAS (rg~1) and did not identify any additional association signals (Supplementary Fig.1-3). Thus, biological processes underlying pathophysiology of insomnia symptoms may be common between the general population and those with co-morbidities, in accordance with the recent clinical reclassification of primary and secondary insomnia diagnoses into an insomnia disorder 12 . The prevalence of insomnia symptoms varies by sex, therefore we performed secondary sex stratified GWAS for both frequent and any insomnia symptoms. Thirteen loci were found (8 in women and 5 in men), of which seven demonstrated evidence of sex interactions (psex-int<3x10 -4 ; with stronger effects in women at KRT8P18, NT5C2, NMT1, CCDC148, C11ORF49 and stronger effects in men at CADM1 and SLC8A3; Supplementary Table 5). Effects in women were not modified by menopausal status (Supplementary Table 5). Furthermore, as described previously 9,10 , the genetic architecture for frequent insomnia symptoms differed by sex, with a genetic correlation between the stratified GWAS of rg=0.807 (Supplementary Fig.3). Given the limitations of the self-report of insomnia symptoms 13 , we sought additional replication and validation of genetic association signals. First, we tested our 57 lead variants for association with self-reported insomnia symptoms in participants from the population-based HUNT study (n=14,923 cases, 47,610 controls; characteristics described in Supplementary Table 6) 14 . Replication was observed for the MEIS1 variant, and 40/57 variants showed a consistent direction of effect across both studies (binomial test p=9 x 10 -4 )(Supplementary Table 7). A genetic risk score of 57 variants (GRS) weighted by effect estimates from the primary UK Biobank GWAS was also associated with insomnia symptoms in HUNT (OR [95%CI] 1.015 [1.01-1.02] per allele, p=2.71x10 -11 )(Table 1). Second, we tested for and found an association of the GRS with physician diagnosed insomnia in the Partners Biobank (n= 2,217 cases, 14,240 controls; OR [95%CI) 1.017 [1.007-1.027] per allele, 8.88x10 -4 ; Table 1, Supplementary Table 8). Third, to investigate impact of genetic variants on objective sleep patterns, we tested the 57 lead variants for association with 8 activity-monitor measures of sleep fragmentation, duration and timing in a subset of the UK Biobank participants of European ancestry who had undergone 7 days of wrist-worn accelerometry (n=84,745, characteristics described in Supplementary Table 6). The lead MEIS1 risk variant was associated with a higher number of sleep episodes, lower sleep efficiency, shorter sleep duration and later sleep timing (p<0.0008; Supplementary Table 9). The GRS was associated with reduced sleep efficiency (difference = -0.04 (0.01) % per allele; p=4 x 10 -14 ), shorter sleep duration (difference = -0.25 (0.035) mins per allele; p=8 x 10 -13 ) and greater day-to-day variability in sleep duration (difference = 0.077 (0.025) mins per allele; p=0.0017) but not with the number of sleep episodes or diurnal inactivity duration (Table 1, Supplementary Table 9). In order to gain insight into the probable causal variant underlying the 57 genetic association signals, we performed fine-mapping based on 1KG project linkage information using credible set analysis in PICS 15 and identified 38 variants with a causal probability of 0.20 or greater (Supplementary Table 10). The majority of likely causal variants lie within introns (34%) or downstream of a gene (22%), consistent with previous literature demonstrating that non-coding variation causally influences the majority of phenotypic associations for complex traits (Supplementary Fig.4) 16 . This list includes missense variants located in NAD kinase NADK (N262K) and MDGA1 (L61P) and rs324017, a SNP within the gene encoding transcriptional repressor NAB1 (also known as EGR-1 binding protein) that is predicted to disrupt a binding site for EGR1 (Supplementary Table 10), a transcription factor involved in response to stress 17 and in synaptic plasticity during REM sleep 17 (Supplementary Table 9). The 57 insomnia symptoms loci lie in genomic regions encompassing up to 236 genes, and a summa

[1]  E. V. van Someren,et al.  Insomnia heterogeneity: Characteristics to consider for data-driven multivariate subtyping. , 2017, Sleep medicine reviews.

[2]  T. Paunio,et al.  European guideline for the diagnosis and treatment of insomnia , 2017, Journal of sleep research.

[3]  Erdogan Taskesen,et al.  Functional mapping and annotation of genetic associations with FUMA , 2017, Nature Communications.

[4]  Stephen H. Bell,et al.  Identification of novel risk loci for restless legs syndrome in genome-wide association studies in individuals of European ancestry , 2022 .

[5]  Lloyd T. Elliott,et al.  The genetic basis of human brain structure and function: 1,262 genome-wide associations found from 3,144 GWAS of multimodal brain imaging phenotypes from 9,707 UK Biobank participants , 2017, bioRxiv.

[6]  Susan Redline,et al.  Insomnia and Risk of Cardiovascular Disease. , 2017, Chest.

[7]  P. Donnelly,et al.  Genome-wide genetic data on ~500,000 UK Biobank participants , 2017, bioRxiv.

[8]  Kristin G Ardlie,et al.  Genetic Analysis in UK Biobank Links Insulin Resistance and Transendothelial Migration Pathways to Coronary Artery Disease , 2017, Nature Genetics.

[9]  H. Stefánsson,et al.  Genome-wide association analysis of insomnia complaints identifies risk genes and genetic overlap with psychiatric and metabolic traits , 2017, Nature Genetics.

[10]  M. Kabbaj,et al.  The Role of Early Growth Response 1 (EGR1) in Brain Plasticity and Neuropsychiatric Disorders , 2017, Front. Behav. Neurosci..

[11]  Daniel J Buysse,et al.  Clinical Practice Guideline for the Pharmacologic Treatment of Chronic Insomnia in Adults: An American Academy of Sleep Medicine Clinical Practice Guideline. , 2017, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[12]  Nils Y. Hammerla,et al.  Large Scale Population Assessment of Physical Activity Using Wrist Worn Accelerometers: The UK Biobank Study , 2017, PloS one.

[13]  Xiaofeng Zhu,et al.  Genome-wide association analyses of sleep disturbance traits identify new loci and highlight shared genetics with neuropsychiatric and metabolic traits , 2016, Nature Genetics.

[14]  Hashem A. Shihab,et al.  MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations , 2016, bioRxiv.

[15]  P. Gehrman,et al.  Genetic Pathways to Insomnia , 2016, Brain sciences.

[16]  Alan M. Kwong,et al.  Next-generation genotype imputation service and methods , 2016, Nature Genetics.

[17]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[18]  Tom R. Gaunt,et al.  LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis , 2016, bioRxiv.

[19]  G. Davey Smith,et al.  Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator , 2016, Genetic epidemiology.

[20]  Daniel Marbach,et al.  Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics , 2016, PLoS Comput. Biol..

[21]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[22]  Michael Catt,et al.  A Novel, Open Access Method to Assess Sleep Duration Using a Wrist-Worn Accelerometer , 2015, PloS one.

[23]  L. Wain,et al.  Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank , 2015, The Lancet. Respiratory medicine.

[24]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[25]  Denise Sharon,et al.  Restless Legs Syndrome and Sleep Related Movement Disorders. , 2015, Sleep medicine clinics.

[26]  J. Lupski,et al.  Non-coding genetic variants in human disease. , 2015, Human molecular genetics.

[27]  J. Gill,et al.  Improved Sleep in Military Personnel is Associated with Changes in the Expression of Inflammatory Genes and Improvement in Depression Symptoms , 2015, Front. Psychiatry.

[28]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[29]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[30]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[31]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[32]  B. Berger,et al.  Efficient Bayesian mixed model analysis increases association power in large cohorts , 2014, Nature Genetics.

[33]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[34]  Laura Marshall,et al.  Insomnia disorder , 2015, Nature Reviews Disease Primers.

[35]  Joss Langford,et al.  Autocalibration of accelerometer data for free-living physical activity assessment using local gravity and temperature: an evaluation on four continents , 2014, Journal of applied physiology.

[36]  P. Dargan,et al.  Misuse of benzodiazepines and Z-drugs in the UK , 2014, British Journal of Psychiatry.

[37]  Ross M. Fraser,et al.  A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness , 2014, PLoS genetics.

[38]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[39]  Jonathan J. Evans,et al.  Prevalence and Characteristics of Probable Major Depression and Bipolar Disorder within UK Biobank: Cross-Sectional Study of 172,751 Participants , 2013, PloS one.

[40]  A. Butterworth,et al.  Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data , 2013, Genetic epidemiology.

[41]  K. Hveem,et al.  COHORT PROFILE Cohort Profile : The HUNT Study , Norway , 2013 .

[42]  P. Renshaw,et al.  Increased Rostral Anterior Cingulate Cortex Volume in Chronic Primary Insomnia. , 2013, Sleep.

[43]  Vihang N. Vahia,et al.  Diagnostic and statistical manual of mental disorders 5: A quick glance , 2013, Indian journal of psychiatry.

[44]  Alexander Horsch,et al.  Separating Movement and Gravity Components in an Acceleration Signal and Implications for the Assessment of Human Daily Physical Activity , 2013, PloS one.

[45]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[46]  S. Sanyal,et al.  An emerging role for Cullin-3 mediated ubiquitination in sleep and circadian rhythm , 2013, Fly.

[47]  A. Pack,et al.  Sleep is not just for the brain: transcriptional responses to sleep in peripheral tissues , 2013, BMC Genomics.

[48]  C. Morin,et al.  Chronic insomnia , 2012, The Lancet.

[49]  M. W. Young,et al.  insomniac and Cullin-3 Regulate Sleep and Wakefulness in Drosophila , 2011, Neuron.

[50]  D. Kromhout,et al.  Sleep duration and sleep quality in relation to 12-year cardiovascular disease incidence: the MORGEN study. , 2011, Sleep.

[51]  M. Qiu,et al.  Role of Basal Ganglia in Sleep–Wake Regulation: Neural Circuitry and Clinical Significance , 2010, Front. Neuroanat..

[52]  F. Schmidt Meta-Analysis , 2008 .

[53]  Thomas G. Rawski QUICK GLANCE , 2008 .

[54]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.