Framework for quality assessment of whole genome cancer sequences

Bringing together cancer genomes from different projects increases power and allows the investigation of pan-cancer, molecular mechanisms. However, working with whole genomes sequenced over several years in different sequencing centres requires a framework to compare the quality of these sequences. We used the Pan-Cancer Analysis of Whole Genomes cohort as a test case to construct such a framework. This cohort contains whole cancer genomes of 2832 donors from 18 sequencing centres. We developed a non-redundant set of five quality control (QC) measurements to establish a star rating system. These QC measures reflect known differences in sequencing protocol and provide a guide to downstream analyses and allow for exclusion of samples of poor quality. We have found that this is an effective framework of quality measures. The implementation of the framework is available at: https://dockstore.org/containers/quay.io/jwerner_dkfz/pancanqc:1.2.2.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[3]  Edwin Cuppen,et al.  Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries , 2010, Nucleic acids research.

[4]  D. Kwiatkowski,et al.  Optimizing illumina next-generation sequencing library preparation for extremely at-biased genomes , 2012, BMC Genomics.

[5]  Eric S. Lander,et al.  The genomic complexity of primary human prostate cancer , 2010, Nature.

[6]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[7]  Lincoln D. Stein,et al.  Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes , 2012, Nature.

[8]  Thomas D. Wu,et al.  Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events , 2012, Genome research.

[9]  Liliana Goumnerova,et al.  Genomic analysis of diffuse pediatric low-grade gliomas identifies recurrent oncogenic truncating rearrangements in the transcription factor MYBL1 , 2013, Proceedings of the National Academy of Sciences.

[10]  J. Korbel,et al.  Criteria for Inference of Chromothripsis in Cancer Genomes , 2013, Cell.

[11]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[12]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[13]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of urothelial bladder carcinoma , 2014, Nature.

[14]  Brian Craft,et al.  The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data , 2014, Database J. Biol. Databases Curation.

[15]  J. Zook,et al.  Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls , 2013, Nature Biotechnology.

[16]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[17]  A. Valencia,et al.  Non-coding recurrent mutations in chronic lymphocytic leukaemia , 2015, Nature.

[18]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of head and neck squamous cell carcinomas , 2015, Nature.

[19]  Joshy George,et al.  Whole–genome characterization of chemoresistant ovarian cancer , 2015, Nature.

[20]  Li Ding,et al.  Patterns and functional implications of rare germline variants across 12 cancer types , 2015, Nature Communications.

[21]  Steven J. M. Jones,et al.  Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma , 2016, Cell.

[22]  Pingfang Liu,et al.  DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification , 2017, Science.

[23]  Roland Eils,et al.  ACEseq – allele specific copy number estimation from whole genome sequencing , 2017, bioRxiv.

[24]  Joshua D. Campbell,et al.  NetSig: network-based discovery from cancer genomes , 2017, Nature Methods.

[25]  The Icgctcga Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes , 2020 .

[26]  Ken Chen,et al.  Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing , 2018, Nature Genetics.

[27]  Steven J. M. Jones,et al.  Pan-cancer analysis of whole genomes , 2020, Nature.