Motivation: The application of a genomics assay to samples from a cohort is a frequently applied experimental design in cancer genomics studies. The collection and analysis of cancer sequencing data in the clinical setting is an elaborate process that may involve consenting patients, obtaining possibly-multiple DNA samples, sequencing and analysis. Many of these steps are manual. At any stage mistakes can occur that cause a DNA sample to be labelled incorrectly. However, there is a paucity of methods in the literature to identify such swaps specifically in cancer studies. Results: Here, we introduce a simple method, HYSYS, to estimate the relatedness of samples and test for sample swaps and contamination. The test uses the concordance of homozygous SNPs between samples. The method is motivated by the observation that homozygous germline population variants rarely change in the disease and are not affected by loss of heterozygosity. Our tools include visualization and a testing framework to flag possible sample swaps. We demonstrate the utility of this approach on a small cohort.
[1]
Jan Schröder,et al.
HYSYS: have you swapped your samples?
,
2016,
Bioinform..
[2]
Jan G. Hengstler,et al.
Identification of sample annotation errors in gene expression datasets
,
2015,
Archives of Toxicology.
[3]
Kristian Cibulskis,et al.
ContEst: estimating cross-contamination of human samples in next-generation sequencing data
,
2011,
Bioinform..
[4]
A. Shlien,et al.
Copy number variations and cancer
,
2009,
Genome Medicine.
[5]
Manuel A. R. Ferreira,et al.
PLINK: a tool set for whole-genome association and population-based linkage analyses.
,
2007,
American journal of human genetics.
[6]
L Sun,et al.
Statistical tests for detection of misspecified relationships by use of genome-screen data.
,
2000,
American journal of human genetics.