Structural variants compose the majority of human genetic variation, but are difficult to assess using current genomic sequencing technologies. Optical mapping technologies, which measure the size of chromosomal fragments between labeled markers, offer an alternative approach. As these technologies mature towards becoming clinical tools, there is a need to develop an approach for determining the optimal strategy for sampling biological material in order to detect a variant at some threshold. Here we develop an optimization approach using a simple, yet realistic, model of the genomic mapping process using a hyper-geometric distribution and {probabilistic} concentration inequalities. Our approach is both computationally and analytically tractable and includes a novel approach to getting tail bounds of hyper-geometric distribution. We show that if a genomic mapping technology can sample most of the chromosomal fragments within a sample, comparatively little biological material is needed to detect a variant at high confidence.
[1]
L. Gordon,et al.
Tutorial on large deviations for the binomial distribution
,
1989
.
[2]
Michael Short,et al.
Improved Inequalities for the Poisson and Binomial Distribution and Upper Tail Quantile Functions
,
2013
.
[3]
Evan E. Eichler,et al.
Characterizing the Major Structural Variant Alleles of the Human Genome
,
2019,
Cell.
[4]
John Huddleston,et al.
An Incomplete Understanding of Human Genetic Variation
,
2016,
Genetics.
[5]
Matthew E Hurles,et al.
The functional impact of structural variation in humans.
,
2008,
Trends in genetics : TIG.
[6]
Evan E. Eichler,et al.
Genetic variation and the de novo assembly of human genomes
,
2015,
Nature Reviews Genetics.