Evaluation of Paired-End Sequencing Strategies for Detection of Genome Rearrangements in Cancer

Paired-end sequencing is emerging as a key technique for assessing genome rearrangements and structural variation on a genome-wide scale. This technique is particularly useful for detecting copy-neutral rearrangements, such as inversions and translocations, which are common in cancer and can produce novel fusion genes. We address the question of how much sequencing is required to detect rearrangement breakpoints and to localize them precisely using both theoretical models and simulation. We derive a formula for the probability that a fusion gene exists in a cancer genome given a collection of paired-end sequences from this genome. We use this formula to compute fusion gene probabilities in several breast cancer samples, and we find that we are able to accurately predict fusion genes in these samples with a relatively small number of fragments of large size. We further demonstrate how the ability to detect fusion genes depends on the distribution of gene lengths, and we evaluate how different parameters of a sequencing strategy impact breakpoint detection, breakpoint localization, and fusion gene detection, even in the presence of errors that suggest false rearrangements. These results will be useful in calibrating future cancer sequencing efforts, particularly large-scale studies of many cancer genomes that are enabled by next-generation sequencing technologies.

[1]  E. Mardis,et al.  Genome Sequencing Technology and Algorithms , 2007 .

[2]  R. Kurzrock,et al.  The molecular pathology of chronic myelogenous leukaemia , 1991, British journal of haematology.

[3]  Andrew Menzies,et al.  Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. , 2007, Genome research.

[4]  Martin Strauch,et al.  Reconstructing Tumor Genome Architectures , 2022 .

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Robert Kincaid,et al.  Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  David Martin,et al.  Computational Molecular Biology: An Algorithmic Approach , 2001 .

[8]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[9]  Anya Tsalenko,et al.  High resolution oligonucleotide CGH using DNA from archived prostate tissue , 2007, The Prostate.

[10]  D N Shapiro,et al.  Fusion of a kinase gene, ALK, to a nucleolar protein gene, NPM, in non-Hodgkin's lymphoma. , 1994, Science.

[11]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[12]  J Erikson,et al.  Molecular genetics of human B- and T-cell neoplasia. , 1986, Cold Spring Harbor symposia on quantitative biology.

[13]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[14]  L. Clarke,et al.  A colony bank containing synthetic Col El hybrid plasmids representative of the entire E. coli genome. 1976. , 1992, Biotechnology.

[15]  L. Du,et al.  Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes , 2006, Nucleic acids research.

[16]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[17]  B. Druker,et al.  STI571 (Gleevec) as a paradigm for cancer therapy. , 2002, Trends in molecular medicine.

[18]  Benjamin J. Raphael,et al.  Analysis of Genomic Alterations in Cancer , 2007 .

[19]  E. Eichler,et al.  Fine-scale structural variation of the human genome , 2005, Nature Genetics.

[20]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[21]  O. Kallioniemi,et al.  Cloning of BCAS3 (17q23) and BCAS4 (20q13) genes that undergo amplification, overexpression, and fusion in breast cancer † , 2002, Genes, chromosomes & cancer.

[22]  Ali Bashir,et al.  Optimization of primer design for the detection of variable genomic lesions in cancer , 2007, Bioinform..

[23]  Benjamin J. Raphael,et al.  A sequence-based survey of the complex structural organization of tumor genomes , 2008, Genome Biology.

[24]  Alan L Rockwood,et al.  Proteomic identification of oncogenic chromosomal translocation partners encoding chimeric anaplastic lymphoma kinase fusion proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Yu-Tseung Liu,et al.  A Novel Approach for Determining Cancer Genomic Breakpoints in the Presence of Normal DNA , 2007, PloS one.

[26]  D N Shapiro,et al.  Fusion of a kinase gene, ALK, to a nucleolar protein gene, NPM, in non-Hodgkin's lymphoma. , 1994, Science.

[27]  K. Chin,et al.  End-sequence profiling: Sequence-based analysis of aberrant genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Tchinda,et al.  Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer , 2005, Science.

[29]  T. Dingermann,et al.  Spliced MLL fusions: a novel mechanism to generate functional chimeric MLL-MLLT1 transcripts in t(11;19)(q23;p13.3) leukemia , 2007, Leukemia.

[30]  Atif Shahab,et al.  Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs). , 2007, Genome research.

[31]  D. Wilson Tissue , 2009, The Lancet.

[32]  Benjamin J. Raphael,et al.  Reconstructing tumor amplisomes , 2004, ISMB/ECCB.

[33]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[34]  Benjamin J. Raphael,et al.  Decoding the fine-scale structure of a breast cancer genome and transcriptome. , 2006, Genome research.

[35]  C. Denny,et al.  Ewing sarcoma 11;22 translocation produces a chimeric transcription factor that requires the DNA-binding domain encoded by FLI1 for transformation. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Benjamin J. Raphael,et al.  Reconstructing tumor genome architectures , 2003, ECCB.

[37]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[38]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[39]  B. Johansson,et al.  Fusion genes and rearranged genes as a linear function of chromosome aberrations in cancer , 2004, Nature Genetics.

[40]  H. Aburatani,et al.  Identification of the transforming EML4–ALK fusion gene in non-small-cell lung cancer , 2007, Nature.

[41]  John Carbon,et al.  A colony bank containing synthetic CoI EI hybrid plasmids representative of the entire E. coli genome , 1976, Cell.

[42]  Sanghyuk Lee,et al.  ChimerDB—a knowledgebase for fusion sequences , 2005, Nucleic Acids Res..