NIAGADS Alzheimer's GenomicsDB: A resource for exploring Alzheimer's disease genetic and genomic knowledge

INTRODUCTION The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.

[1]  2023 Alzheimer's disease facts and figures , 2023, Alzheimer's & dementia : the journal of the Alzheimer's Association.

[2]  P. Kuksa,et al.  FILER: a framework for harmonizing and querying large-scale functional genomics knowledge , 2022, NAR genomics and bioinformatics.

[3]  Jeremy D. DeBarry,et al.  VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center , 2021, Nucleic Acids Res..

[4]  Ira M. Hall,et al.  High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios , 2021, Cell.

[5]  Astrid Gall,et al.  Ensembl 2021 , 2020, Nucleic Acids Res..

[6]  William S Bush,et al.  Hadoop and PySpark for reproducibility and scalability of genomic sequencing studies , 2019, PSB.

[7]  Thawfeek M. Varusai,et al.  The reactome pathway knowledgebase , 2019, Nucleic Acids Res..

[8]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[9]  Alan F. Scott,et al.  OMIM.org: leveraging knowledge across phenotype–gene relationships , 2018, Nucleic Acids Res..

[10]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[11]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[12]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[13]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[14]  Paul Denny,et al.  Genenames.org: the HGNC and VGNC resources in 2019 , 2018, Nucleic Acids Res..

[15]  Michael Boehnke,et al.  emeraLD: Rapid Linkage Disequilibrium Estimation with Massive Data Sets , 2018, bioRxiv.

[16]  William S. Bush,et al.  Functional annotation of genomic variants in studies of late-onset Alzheimer’s disease , 2018, Bioinform..

[17]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[18]  Brent S. Pedersen,et al.  GIGGLE: a search engine for large-scale integrated genome analysis , 2017, Nature Methods.

[19]  Daniel M. Childress,et al.  NIAGADS: The NIA Genetics of Alzheimer's Disease Data Storage Site , 2016, Alzheimer's & Dementia.

[20]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[21]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[22]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[23]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[24]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[25]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[26]  Steve Pettifer,et al.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats , 2013, Bioinform..

[27]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[28]  D. G. Clark,et al.  Common variants in MS4A4/MS4A6E, CD2uAP, CD33, and EPHA1 are associated with late-onset Alzheimer’s disease , 2011, Nature Genetics.

[29]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[30]  W. G. Hill,et al.  The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis , 2009, PLoS genetics.

[31]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[32]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[33]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[34]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[35]  OUP accepted manuscript , 2021, Database.

[36]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[37]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..