GenomeChronicler: The Personal Genome Project UK Genomic Report Generator Pipeline

In recent years, there has been a significant increase in whole genome sequencing data of individual genomes produced by research projects as well as direct to consumer service providers. While many of these sources provide their users with an interpretation of the data, there is a lack of free, open tools for generating reports exploring the data in an easy to understand manner. GenomeChronicler was developed as part of the Personal Genome Project UK (PGP-UK) to address this need. PGP-UK provides genomic, transcriptomic, epigenomic and self-reported phenotypic data under an open-access model with full ethical approval. As a result, the reports generated by GenomeChronicler are intended for research purposes only and include information relating to potentially beneficial and potentially harmful variants, but without clinical curation. GenomeChronicler can be used with data from whole genome or whole exome sequencing, producing a genome report containing information on variant statistics, ancestry and known associated phenotypic traits. Example reports are available from the PGP-UK data page (personalgenomes.org.uk/data). The objective of this method is to leverage existing resources to find known phenotypes associated with the genotypes detected in each sample. The provided trait data is based primarily upon information available in SNPedia, but also collates data from ClinVar, GETevidence and gnomAD to provide additional details on potential health implications, presence of genotype in other PGP participants and population frequency of each genotype. The analysis can be run in a self-contained environment without requiring internet access, making it a good choice for cases where privacy is essential or desired: any third party project can embed GenomeChronicler within their off-line safe-haven environments. GenomeChronicler can be run for one sample at a time, or in parallel making use of the Nextflow workflow manager. The source code is available from GitHub (https://github.com/PGP-UK/GenomeChronicler), container recipes are available for Docker and Singularity, as well as a pre-built container from SingularityHub (https://singularity-hub.org/collections/3664) enabling easy deployment in a variety of settings. Users without access to computational resources to run GenomeChronicler can access the software from the Lifebit CloudOS platform (https://lifebit.ai/cloudos) enabling the production of reports and variant calls from raw sequencing data in a scalable fashion.

[1]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[2]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[3]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[4]  J. Venter,et al.  Multiple personal genomes await , 2010, Nature.

[5]  PGP-UK Consortium Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalized medicine , 2018 .

[6]  A. Kasarskis,et al.  Impacts of incorporating personal genome sequencing into graduate genomics education: a longitudinal study over three course years , 2018, BMC Medical Genomics.

[7]  Alexander Wait Zaranek,et al.  The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes , 2016, GigaScience.

[8]  Kadija Ferryman,et al.  Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project , 2015, European Journal of Human Genetics.

[9]  Lon Phan,et al.  Phenotype–Genotype Integrator (PheGenI): synthesizing genome-wide association study (GWAS) data with existing genomic resources , 2013, European Journal of Human Genetics.

[10]  Sebastian Scherer,et al.  Cryo-EM structures of the pore-forming A subunit from the Yersinia entomophaga ABC toxin , 2019, Nature Communications.

[11]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[12]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[13]  Christopher Ré,et al.  A machine-compiled database of genome-wide association studies , 2019, Nature Communications.

[14]  Jesper Eisfeldt,et al.  Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants , 2018, bioRxiv.

[15]  Marylyn D. Ritchie,et al.  PharmCAT: A Pharmacogenomics Clinical Annotation Tool , 2017, Clinical pharmacology and therapeutics.

[16]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[17]  Z. Boka Enhancing reproducibility , 2013, Nature Methods.

[18]  Yuan Tian,et al.  The Personal Genome Project-UK, an open access resource of human multi-omics data , 2019, bioRxiv.

[19]  Melissa J Landrum,et al.  ClinVar at five years: Delivering on the promise , 2018, Human mutation.

[20]  Stephanie Halford,et al.  Phenopolis: an open platform for harmonization and analysis of genetic and phenotypic data , 2017, Bioinform..

[21]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[22]  Ryan L. Collins,et al.  Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.

[23]  Michael Cariaso,et al.  SNPedia: a wiki supporting personal genome annotation, interpretation and analysis , 2011, Nucleic Acids Res..

[24]  G. M. Kurtzer,et al.  Enhancing reproducibility in scientific computing: Metrics and registry for Singularity containers , 2017, PloS one.

[25]  Manuel Corpas,et al.  Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalized medicine , 2018, BMC Medical Genomics.

[26]  Sven Nahnsen,et al.  nf-core: Community curated bioinformatics pipelines , 2019, bioRxiv.

[27]  P. Bayer,et al.  openSNP–A Crowdsourced Web Resource for Personal Genomics , 2014, PloS one.

[28]  Konrad J. Karczewski,et al.  Evidence That Personal Genome Testing Enhances Student Learning in a Course on Genomics and Personalized Medicine , 2013, PloS one.

[29]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[30]  Philip A. Ewels,et al.  Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants , 2020, F1000Research.

[31]  Eivind Hovig,et al.  Personal Cancer Genome Reporter: variant interpretation report for precision oncology , 2017, bioRxiv.

[32]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[33]  Oded Nov,et al.  Open Humans: A platform for participant-centered research and personal data exploration , 2018, bioRxiv.