Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing Environments

The International Cancer Genome Consortium (ICGC)’s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients. To provide this dataset to the research working groups for downstream analysis, the PCAWG Technical Working Group marshalled ~800TB of sequencing data from distributed geographical locations; developed portable software for uniform alignment, variant calling, artifact filtering and variant merging; performed the analysis in a geographically and technologically disparate collection of compute environments; and disseminated high-quality validated consensus variants to the working groups. The PCAWG dataset has been mirrored to multiple repositories and can be located using the ICGC Data Portal. The PCAWG workflows are also available as Docker images through Dockstore enabling researchers to replicate our analysis on their own data.

Wan Choi | Oliver Hofmann | Roland Eils | Lucila Ohno-Machado | Satoru Miyano | Nuno A. Fonseca | Paul Flicek | Adam P. Butler | Peter J. Campbell | Youngwook Kim | Claudiu Farcas | Marc D. Perry | Carolyn M. Hutter | Seiya Imoto | Christina K. Yung | Manuel Prinz | Brian D. O’Connor | Sergei Yakneen | Junjun Zhang | Kyle Ellrott | Kortine Kleinheinz | Naoki Miyoshi | Keiran M. Raine | Romina Royo | Gordon B. Saksena | Matthias Schlesner | Solomon I. Shorser | Miguel Vazquez | Joachim Weischenfeldt | Denis Yuen | Brandi N. Davis-Dusenbery | Vincent Ferretti | Robert L. Grossman | Olivier Harismendy | Hidewaki Nakagawa | Steven J. Newhouse | David Torrents | Lincoln D. Stein | Javier Bartolomé Rodriguez | Keith A. Boroevich | Rich Boyce | Angela N. Brooks | Alex Buchanan | Ivo Buchhalter | Niall J. Byrne | Andy Cafferkey | Zhaohong Chen | Sunghoon Cho | Peter Clapham | Francisco M. De La Vega | Jonas Demeulemeester | Michelle T. Dow | Lewis J. Dursi | Juergen Eils | Francesco Favero | Nodirjon Fayzullaev | Nuno A. Fonseca | Josep L.l. Gelpi | Gad Getz | Bob Gibson | Michael C. Heinold | Julian M. Hess | Jongwhi H. Hong | Thomas J. Hudson | Daniel Huebschmann | Barbara Hutter | Sinisa Ivkovic | Seung-Hyup Jeon | Wei Jiao | Jongsun Jung | Rolf Kabbe | Andre Kahles | Jules Kerssemakers | Hyunghwan Kim | Hyung-Lae Kim | Jihoon Kim | Jan O. Korbel | Michael Koscher | Antonios Koures | Milena Kovacevic | Chris Lawerenz | Ignaty Leshchiner | Dimitri G. Livitz | George L. Mihaiescu | Sanja Mijalkovic | Ana Mijalkovic Lazic | Hardeep K. Nahal | Mia Nastic | Jonathan Nicholson | David Ocana | Kazuhiro Ohi | Larsson Omberg | B.F. Francis Ouellette | Nagarajan Paramasivam | Todd D. Pihl | Montserrat Puiggròs | Petar Radovic | Esther Rheinbay | Mara W. Rosenberg | Charles Short | Heidi J. Sofia | Jonathan Spring | Adam J. Struck | Grace Tiao | Nebojsa Tijanic | Peter Van Loo | David Vicente | Jeremiah A. Wala | Zhining Wang | Johannes Werner | Ashley Williams | Youngchoon Woo | Adam J. Wright | Qian Xiang | Jules N. A. Kerssemakers | L. Stein | T. Hudson | G. Getz | D. Torrents | L. Ohno-Machado | R. Eils | R. Grossman | J. Korbel | V. Ferretti | F. M. De La Vega | O. Harismendy | P. Campbell | P. Flicek | S. Miyano | P. Loo | A. Butler | K. Raine | Kyle Ellrott | R. Royo | G. Saksena | B. Hutter | M. Schlesner | H. Sofia | T. Pihl | S. Imoto | L. Omberg | O. Hofmann | Rolf Kabbe | M. Perry | A. Kahles | Youngwook Kim | G. Tiao | P. Clapham | L. Dursi | C. Farcas | C. Hutter | J. Hess | Jihoon Kim | J. Weischenfeldt | F. Favero | C. Lawerenz | C. Yung | M. Vazquez | B. Davis-Dusenbery | Esther Rheinbay | H. Nakagawa | D. Livitz | I. Leshchiner | Hyung-Lae Kim | R. Boyce | I. Buchhalter | K. Kleinheinz | Mara Rosenberg | Jongsun Jung | Junjun Zhang | J. Wala | J. Demeulemeester | Juergen Eils | A. Lazić | Solomon Shorser | Michelle Dow | Alex Buchanan | Sergei Yakneen | N. Miyoshi | Andy Cafferkey | S. Newhouse | Denis Yuen | Qian Xiang | J. Gelpi | N. Paramasivam | Montserrat Puiggrós | Johannes Werner | Zhaohong Chen | Sunghoon Cho | Wan Choi | Nodirjon Fayzullaev | Sinisa Ivkovic | S. Jeon | W. Jiao | Hyunghwan Kim | M. Koscher | Antonios Koures | Milena Kovačević | S. Mijalkovic | Mia Nastic | J. Nicholson | D. Ocana | Kazuhiro Ohi | Manuel Prinz | P. Radović | Charlie Short | Jonathan Spring | Adam Struck | N. Tijanic | David Vicente | Ashley Williams | Youngchoon Woo | Adam Wright | Hardeep Nahal | Zhining Wang | B. Francis Ouellette | Bob Gibson | D. Huebschmann

[1]  lhealtlhy youin-g,et al.  Hospital for Sick Children , 1857, British medical journal.

[2]  Brian D. O'Connor,et al.  SeqWare Query Engine: storing and searching sequence data in the cloud , 2010, BMC Bioinformatics.

[3]  Jane Kaye,et al.  Towards a data sharing Code of Conduct for international genomic research , 2011, Genome Medicine.

[4]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[5]  Adam A. Margolin,et al.  Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas , 2013, Nature Genetics.

[6]  Trevor J Pugh,et al.  Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation , 2013, Nucleic acids research.

[7]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[8]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[9]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[10]  G. McVean,et al.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications , 2014, Nature Genetics.

[11]  Brian Craft,et al.  The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data , 2014, Database J. Biol. Databases Curation.

[12]  Michael C. Heinold,et al.  A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing , 2015, Nature Communications.

[13]  Keiran M Raine,et al.  cgpPindel: Identifying Somatically Acquired Insertion and Deletion Events from Paired End Sequencing , 2015, Current protocols in bioinformatics.

[14]  Andrew Menzies,et al.  ascatNgs: Identifying Somatically Acquired Copy‐Number Alterations from Whole‐Genome Sequencing Data , 2016, Current protocols in bioinformatics.

[15]  David Jones,et al.  cgpCaVEManWrapper: Simple Execution of CaVEMan in Order to Detect Somatic Single Nucleotide Variants in NGS Data , 2016, Current protocols in bioinformatics.

[16]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[17]  Cheng-Zhong Zhang,et al.  VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules , 2016, Bioinform..

[18]  Wang Wenyi,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[19]  P. A. Futreal,et al.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data , 2016, Genome Biology.

[20]  Benedict Paten,et al.  The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows , 2017, F1000Research.