Butler enables rapid cloud-based analysis of thousands of human genomes

We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner. Efficient, large-scale genomic analysis is facilitated on the cloud by a computational tool with error-diagnosing and self-healing capabilities.

Robert L. Grossman | Oliver Hofmann | Roland Eils | Lucila Ohno-Machado | Satoru Miyano | Wan Choi | Nuno A. Fonseca | Sebastian M. Waszak | Jan O. Korbel | Adam P. Butler | Gunnar Rätsch | Claudiu Farcas | Jia Liu | Marc D. Perry | Carolyn M. Hutter | Seiya Imoto | Christina K. Yung | Manuel Prinz | Brian D. O’Connor | Sergei Yakneen | Junjun Zhang | Kortine Kleinheinz | Naoki Miyoshi | Keiran M. Raine | Romina Royo | Solomon I. Shorser | Joachim Weischenfeldt | Denis Yuen | Olivier Harismendy | Hidewaki Nakagawa | Steven J. Newhouse | David Torrents | Keith A. Boroevich | Rich Boyce | Angela N. Brooks | Alex Buchanan | Ivo Buchhalter | Niall J. Byrne | Andy Cafferkey | Zhaohong Chen | Sunghoon Cho | Peter Clapham | Francisco M. De La Vega | Michelle T. Dow | Juergen Eils | Nodirjon Fayzullaev | Bob Gibson | Michael C. Heinold | Julian M. Hess | Jongwhi H. Hong | Thomas J. Hudson | Barbara Hutter | Seung-Hyup Jeon | Wei Jiao | Jongsun Jung | Rolf Kabbe | Andre Kahles | Hyunghwan Kim | Hyung-Lae Kim | Jihoon Kim | Michael Koscher | Antonios Koures | Ignaty Leshchiner | George L. Mihaiescu | Mia Nastic | Jonathan Nicholson | David Ocana | Kazuhiro Ohi | Nagarajan Paramasivam | Todd D. Pihl | Montserrat Puiggròs | Esther Rheinbay | Charles Short | Heidi J. Sofia | Adam J. Struck | Grace Tiao | Nebojsa Tijanic | David Vicente | Jeremiah A. Wala | Zhining Wang | Youngchoon Woo | Adam J. Wright | Qian Xiang | Vincent Ferretti | Paul Flicek | Gad Getz | Michael Gertz | Allison P. Heath | Gordon Saksena | Larsson Omberg | Peter Van Loo | Dimitri Livitz | Daniel Hübschmann | Mara Rosenberg | Youngwook Kim | Lincoln D. Stein | Peter J. Campbell | Miguel Vazquez | Sergei Brice Javier Keith A. Rich Angela N. Alex Ivo Adam Yakneen Aminou Bartolome Boroevich Boyce B | Brice Aminou | Javier Bartolome | Brandi N. Davis-Dusenbery | Jonas Demeulemeester | Lewis Jonathan Dursi | Kyle Ellrott | Francesco Favero | Josep Ll. Gelpi | Sinisa Ivkovic | Jules N. A. Kerssemakers | Milena Kovacevic | Chris Lawerenz | Sanja Mijalkovic | Ana Mijalkovic Lazic | Hardeep K. Nahal-Bose | B. F Francis Ouellette | Petar Radovic | Matthias Schlesner | Jonathan Spring | Johannes Werner | Ashley Williams | Liming Yang | T. Hudson | G. Getz | D. Torrents | L. Ohno-Machado | G. Rätsch | R. Eils | R. Grossman | J. Korbel | V. Ferretti | F. M. De La Vega | O. Harismendy | P. Campbell | P. Flicek | S. Miyano | A. Butler | K. Raine | Kyle Ellrott | R. Royo | G. Saksena | P. Van Loo | B. Hutter | M. Schlesner | Liming Yang | T. Pihl | Zhining Wang | S. Imoto | L. Omberg | Rolf Kabbe | A. Kahles | Youngwook Kim | G. Tiao | P. Clapham | L. Dursi | C. Farcas | C. Hutter | J. Hess | Jihoon Kim | J. Weischenfeldt | F. Favero | C. Lawerenz | B. F. Ouellette | C. Yung | B. Davis-Dusenbery | Esther Rheinbay | H. Nakagawa | D. Livitz | I. Leshchiner | Hyung-Lae Kim | D. Hübschmann | S. Waszak | R. Boyce | I. Buchhalter | K. Kleinheinz | Jongsun Jung | Junjun Zhang | J. Wala | J. Demeulemeester | Juergen Eils | Solomon Shorser | Michelle Dow | Alex Buchanan | Hardeep Nahal-Bose | Sergei Yakneen | N. Miyoshi | Andy Cafferkey | S. Newhouse | Denis Yuen | Qian Xiang | J. Gelpi | N. Paramasivam | Montserrat Puiggrós | Johannes Werner | B. Aminou | Javier Bartolome | Zhaohong Chen | Sunghoon Cho | Wan Choi | Nodirjon Fayzullaev | Sinisa Ivkovic | W. Jiao | Hyunghwan Kim | M. Koscher | Antonios Koures | S. Mijalkovic | Mia Nastic | J. Nicholson | Kazuhiro Ohi | Manuel Prinz | P. Radović | Charlie Short | Jonathan Spring | Adam Struck | N. Tijanic | David Vicente | Ashley Williams | Youngchoon Woo | Heidi Sofia | A.P. Heath | Montserrat Puiggròs | Lincoln Stein | Michael Gertz | M. Vazquez | Oliver Hofmann | Mara W Rosenberg | B. Gibson | Seung-Hyup Jeon | Milena Kovacevic | David Ocana | Sergei Brice Javier Keith A. Rich Angela N. Alex Ivo Adam Yakneen Aminou Bartolome Boroevich Boyce B | Jia Liu | Ana Mijalkovic Lazic | Siniša Ivković | S. Mijalković | Brice Aminou | K. Ellrott | Nebojsa Tijanic | Claudiu Farcas | Vincent Ferretti | Paul Flicek

[1]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[2]  Jan O. Korbel,et al.  jectories of genetics , 150 years after Mendel / Trajectoires de la génétique , 150 ans après Mendel ing large-scale genome variation cohorts to decipher the olecular mechanism of cancer de à grande échelle de variations génétiques pour déchiffrer les , 2016 .

[3]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[4]  David A W Soergel,et al.  Rampant software errors may undermine scientific results , 2014, F1000Research.

[5]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[6]  Jan O. Korbel,et al.  Computing patient data in the cloud: practical and legal considerations for genetics and genomics research in Europe and internationally , 2017, Genome Medicine.

[7]  John Chilton,et al.  Common Workflow Language, v1.0 , 2016 .

[8]  Jan O. Korbel,et al.  Data analysis: Create a cloud commons , 2015, Nature.

[9]  Steven J. M. Jones,et al.  Pan-cancer analysis of whole genomes , 2020, Nature.

[10]  Mary Goldman,et al.  Toil enables reproducible, open source, big biomedical data analyses , 2017, Nature Biotechnology.

[11]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[12]  Jeremy Leipzig,et al.  A review of bioinformatic pipeline frameworks , 2016, Briefings Bioinform..

[13]  David C. Jones,et al.  Landscape of somatic mutations in 560 breast cancer whole genome sequences , 2016, Nature.

[14]  Benjamin J. Raphael,et al.  GenomeVIP: a cloud platform for genomic variant discovery and interpretation , 2017, Genome research.

[15]  The Icgctcga Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes , 2020 .

[16]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[17]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[18]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[19]  Icgc,et al.  Pan-cancer analysis of whole genomes , 2017, bioRxiv.

[20]  Keiran M Raine,et al.  cgpPindel: Identifying Somatically Acquired Insertion and Deletion Events from Paired End Sequencing , 2015, Current protocols in bioinformatics.