Mapping short DNA sequencing reads and variants calling using mapping quality scores ( Supplementary Text )

In this supplement text, a letter in uppercase indicates a random variable, whereas a letter in lowercase represents a constant, a known value or a function. Let Σ = {‘A’,‘C’,‘G’,‘T’} be the alphabet of the four nucleotides. In sequencing, the true nucleotide is B ∈ Σ and the one estimated by base caller is B̂. The base error B is defined as: B = Pr{B̂ 6= B} and base quality QB is: QB = −c log B where c is a scaling constant. For Phred quality, c = 10/ log 10 ≈ 3.434. We have: