Achieving Accurate and Fast Base-calling by a Block Model of the Illumina Sequencing Data

Abstract Base-calling accuracy is crucial for high-throughput DNA sequencing and we considered the de-convolution of the intensity data of Illumina systems. While the phasing and color crosstalk are considered with fixed parameters in existing base-calling softwares, we found that these processes are not constant in different part of the data. We thus model and deal with the data in blocks, and by this approach we improve the base calling accuracy as well as reduce the model complexity. A block method based on a linear statistical model is then developed, which balances the accuracy against the computational speed. The performances under different blocking conditions are also analyzed.