A Million Cancer Genome Warehouse

Abstract : This white paper discusses the motivation and issues surrounding the development of a repository and associated computational infrastructure to house and process a million genomes to help battle cancer, which we call the Million Cancer Genome Warehouse. It is proposed as an example of an information commons and a computing system that will bring about precision medicine, coupling established clinical pathological indexes with state-of-the-art molecular profiling to create diagnostic, prognostic, and therapeutic strategies precisely tailored to each patient's individual requirements. The goal of the white paper is to stimulate discussion so as to help reach consensus about the need to construct a Million Cancer Genome Warehouse and what its nature should be. To try to anticipate concerns, including thorough cost estimates, it covers topics as varied as high-level health policy issues to low-level details about statistical analysis, data formats and structures, software design, and hardware construction and cost.

[1]  Bin Tean Teh,et al.  Somatic mutations of the histone H3K27 demethylase, UTX, in human cancer , 2009, Nature Genetics.

[2]  Markus Hsi-Yang Fritz,et al.  Efficient storage of high throughput DNA sequencing data using reference-based compression. , 2011, Genome research.

[3]  G. Church,et al.  The Personal Genome Project , 2005, Molecular systems biology.

[4]  E. Schadt The changing privacy landscape in the era of big data , 2012, Molecular systems biology.

[5]  David Stoddart,et al.  Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore , 2009, Proceedings of the National Academy of Sciences.

[6]  P. Jänne,et al.  Crizotinib for ALK-Rearranged Non–Small Cell Lung Cancer: A New Targeted Therapy for a New Target , 2012, Clinical Cancer Research.

[7]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[8]  A. Hauschild,et al.  Improved survival with vemurafenib in melanoma with BRAF V600E mutation. , 2011, The New England journal of medicine.

[9]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[10]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[11]  Nigam H. Shah,et al.  Using Temporal Patterns in Medical Records to Discern Adverse Drug Events from Indications , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[12]  K. Lindblad-Toh,et al.  Assisted assembly: how to improve a de novo genome assembly by using related species , 2009, Genome Biology.

[13]  Erika Check Hayden,et al.  Informed consent: A broken contract , 2012, Nature.

[14]  J. Licht,et al.  DNMT3A mutations in acute myeloid leukemia , 2011, Nature Genetics.

[15]  A. Børresen-Dale,et al.  Mutational Processes Molding the Genomes of 21 Breast Cancers , 2012, Cell.

[16]  Li Ding,et al.  RECURRENT MUTATIONS IN THE U2AF1 SPLICING FACTOR IN MYELODYSPLASTIC SYNDROMES , 2011, Nature Genetics.

[17]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[18]  P. A. Futreal,et al.  Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma , 2010, Nature.

[19]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[20]  P. G. Lang Ultrasound-guided Fine Needle Aspiration Cytology prior to Sentinel Lymph Node Biopsy in Melanoma Patients , 2008 .

[21]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[22]  Martina Kron,et al.  Ultrasound-guided Fine Needle Aspiration Cytology prior to Sentinel Lymph Node Biopsy in Melanoma Patients , 2006, Annals of Surgical Oncology.

[23]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[24]  Reinhard Windhager,et al.  A New Fine-Needle Aspiration System , 2010, Surgical innovation.

[25]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[26]  S. Sugano,et al.  Frequent pathway mutations of splicing machinery in myelodysplasia , 2011, Nature.

[27]  B. Berger,et al.  Compressive genomics , 2012, Nature Biotechnology.

[28]  Max S Wicha,et al.  Circulating tumor cells: not all detected cells are bad and not all bad cells are detected. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[29]  J. Frankovich,et al.  Evidence-based medicine in the EMR era. , 2011, The New England journal of medicine.

[30]  Irmtraud M. Meyer,et al.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers , 2012, Nature.

[31]  George Coukos,et al.  Cancer immunotherapy comes of age , 2011, Nature.

[32]  L. Chin,et al.  Passenger Deletions Generate Therapeutic Vulnerabilities in Cancer , 2012, Nature.

[33]  Sarah Hewlett,et al.  Consent to research , 1994 .

[34]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[35]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[36]  Francisco Cervantes,et al.  Five-year follow-up of patients receiving imatinib for chronic myeloid leukemia. , 2006, The New England journal of medicine.

[37]  Nikhil Wagle,et al.  High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing. , 2012, Cancer discovery.

[38]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[39]  Kiyoshi Asai,et al.  Transformations for the compression of FASTQ quality scores of next-generation sequencing data , 2012, Bioinform..

[40]  Jinghui Zhang,et al.  Association of age at diagnosis and genetic mutations in patients with neuroblastoma. , 2012, JAMA.

[41]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[42]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[43]  Kristian Cibulskis,et al.  A remarkably simple genome underlies highly malignant pediatric rhabdoid cancers. , 2012, The Journal of clinical investigation.

[44]  James G. R. Gilbert,et al.  Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project , 2008, Immunogenetics.

[45]  N. Carter,et al.  Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development , 2011, Cell.

[46]  Stephen R Quake,et al.  Whole-genome molecular haplotyping of single cells , 2011, Nature Biotechnology.

[47]  P. D. de Jong,et al.  A bacterial artificial chromosome library for sequencing the complete human genome. , 2001, Genome research.

[48]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[49]  István Simon,et al.  BiSearch: primer-design and search tool for PCR on bisulfite-treated genomes , 2005, Nucleic acids research.

[50]  Patricia L. Harris,et al.  Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. , 2004, The New England journal of medicine.

[51]  G. Church,et al.  From genetic privacy to open consent , 2008, Nature Reviews Genetics.

[52]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[53]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[54]  R. Wilson,et al.  Modernizing Reference Genome Assemblies , 2011, PLoS biology.

[55]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.