Assessing Matched Normal and Tumor Pairs in Next-Generation Sequencing Studies

Next generation sequencing technology has revolutionized the study of cancers. Through matched normal-tumor pairs, it is now possible to identify genome-wide germline and somatic mutations. The generation and analysis of the data requires rigorous quality checks and filtering, and the current analytical pipeline is constantly undergoing improvements. We noted however that in analyzing matched pairs, there is an implicit assumption that the sequenced data are matched, without any quality check such as those implemented in association studies. There are serious implications in this assumption as identification of germline and rare somatic variants depend on the normal sample being the matched pair. Using a genetics concept on measuring relatedness between individuals, we demonstrate that the matchedness of tumor pairs can be quantified and should be included as part of a quality protocol in analysis of sequenced data. Despite the mutation changes in cancer samples, matched tumor-normal pairs are still relatively similar in sequence compared to non-matched pairs. We demonstrate that the approach can be used to assess the mutation landscape between individuals.

[1]  Elisha D O Roberson,et al.  Visualization of Shared Genomic Regions and Meiotic Recombination in High-Density SNP Data , 2009, PloS one.

[2]  Ryan D. Morin,et al.  Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution , 2009, Nature.

[3]  Tom Royce,et al.  A comprehensive catalogue of somatic mutations from a human cancer genome , 2010, Nature.

[4]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[5]  P. A. Futreal,et al.  Genetic and structural variation in the gastric cancer kinome revealed through targeted deep sequencing. , 2011, Cancer research.

[6]  Gurpreet W. Tang,et al.  Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes , 2009, Nature.

[7]  E. Birney,et al.  A small cell lung cancer genome reports complex tobacco exposure signatures , 2009, Nature.

[8]  Peter Kraft,et al.  Quality control and quality assurance in genotypic data for genome‐wide association studies , 2010, Genetic epidemiology.

[9]  A. Sparks,et al.  The mutation spectrum revealed by paired genome sequences from a lung cancer patient , 2010, Nature.

[10]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[11]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[12]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  Jonathan W. Pillow,et al.  POSTER PRESENTATION Open Access , 2013 .

[15]  Lon R. Cardon,et al.  GRR: graphical representation of relationship errors , 2001, Bioinform..

[16]  T. Wong,et al.  Genome-wide association studies reveal genetic variants in CTNND2 for high myopia in Singapore Chinese. , 2011, Ophthalmology.

[17]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.