The Personal Genome Project-UK, an open access resource of human multi-omics data

Integrative analysis of multi-omics data is a powerful approach for gaining functional insights into biological and medical processes. Conducting these multifaceted analyses on human samples is often complicated by the fact that the raw sequencing output is rarely available under open access. The Personal Genome Project UK (PGP-UK) is one of few resources that recruits its participants under open consent and makes the resulting multi-omics data freely and openly available. As part of this resource, we describe the PGP-UK multi-omics reference panel consisting of ten genomic, methylomic and transcriptomic data. Specifically, we outline the data processing, quality control and validation procedures which were implemented to ensure data integrity and exclude sample mix-ups. In addition, we provide a REST API to facilitate the download of the entire PGP-UK dataset. The data are also available from two cloud-based environments, providing platforms for free integrated analysis. In conclusion, the genotype-validated PGP-UK multi-omics human reference panel described here provides a valuable new open access resource for integrated analyses in support of personal and medical genomics.

[1]  Stephan Beck Getting up close and personal with UK genomics and beyond , 2018, Genome medicine.

[2]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[3]  PGP-UK Consortium Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalized medicine , 2018 .

[4]  Manuel Corpas,et al.  Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalized medicine , 2018, BMC Medical Genomics.

[5]  Michael Cariaso,et al.  SNPedia: a wiki supporting personal genome annotation, interpretation and analysis , 2011, Nucleic Acids Res..

[6]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[7]  W. McArdle,et al.  Differences in smoking associated DNA methylation patterns in South Asians and Europeans , 2014, Clinical Epigenetics.

[8]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[9]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[10]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[13]  Jonathan A. Heiss,et al.  Identifying mislabeled and contaminated DNA methylation microarray data: an extended quality control toolset with examples from GEO , 2018, Clinical Epigenetics.

[14]  S. Horvath DNA methylation age of human tissues and cell types , 2013, Genome Biology.

[15]  Eloi Casals,et al.  gemBS: high throughput processing for DNA methylation data from bisulfite sequencing , 2018, Bioinform..

[16]  Andrew E. Teschendorff,et al.  ChAMP: 450k Chip Analysis Methylation Pipeline , 2014, Bioinform..

[17]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[18]  Yuan Tian,et al.  ChAMP: updated methylation analysis pipeline for Illumina BeadChips , 2017, Bioinform..

[19]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[20]  Euan A Ashley,et al.  A public resource facilitating clinical use of genomes , 2012, Proceedings of the National Academy of Sciences.

[21]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.