Nanopore DNA Sequencing Channel Modeling

In this paper we first present a simplified DNA storage chain (partially inspired by the classical digital communication principles) and review the last advances on error-correcting codes for DNA storage. We then introduce a novel DNA channel model for a new generation of nanopore sequencers. For this, we analyze the reported experimental results to define and characterize an original DNA sequencing channel model. This contribution opens new directions in the design of efficient error-correcting codes to improve the performance of DNA-based data storage.

[1]  E. Gulari,et al.  In situ synthesis of oligonucleotide microarrays. , 2004, Biopolymers.

[2]  Patrick Robertson,et al.  A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain , 1995, Proceedings IEEE International Conference on Communications ICC '95.

[3]  G. Tenengolts,et al.  Nonbinary codes, correcting single deletion or insertion , 1984, IEEE Trans. Inf. Theory.

[4]  Yuan Yao,et al.  Error removal in microchip-synthesized DNA using immobilized MutS , 2014, Nucleic acids research.

[5]  J Craig Venter,et al.  Chemical synthesis of the mouse mitochondrial genome , 2010, Nature Methods.

[6]  Kannan Ramchandran,et al.  Fundamental limits of DNA storage systems , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[7]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[8]  David Declercq,et al.  Decoding Algorithms for Nonbinary LDPC Codes Over GF$(q)$ , 2007, IEEE Transactions on Communications.

[9]  Han Mao Kiah,et al.  Asymmetric Lee Distance Codes for DNA-Based Storage , 2017, IEEE Trans. Inf. Theory.

[10]  Eitan Yaakobi,et al.  Codes in the damerau distance for DNA storage , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[11]  Jian Ma,et al.  DNA-Based Storage: Trends and Methods , 2015, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[12]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[13]  Han Mao Kiah,et al.  Codes for DNA storage channels , 2014, 2015 IEEE Information Theory Workshop (ITW).

[14]  Robert N Grass,et al.  Robust chemical preservation of digital information on DNA in silica with error-correcting codes. , 2015, Angewandte Chemie.

[15]  Eitan Yaakobi,et al.  Codes Correcting a Burst of Deletions or Insertions , 2016, IEEE Transactions on Information Theory.

[16]  Frederic Sala,et al.  Novel combinatorial coding results for DNA sequencing and data storage , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[17]  Kayvon Mazooji,et al.  Exact sequence reconstruction for insertion-correcting codes , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[18]  Frederic Sala,et al.  Three novel combinatorial theorems for the insertion/deletion channel , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[19]  Frederic Sala,et al.  Exact Reconstruction From Insertions in Synchronization Codes , 2016, IEEE Transactions on Information Theory.

[20]  Han Mao Kiah,et al.  Codes for DNA sequence profiles , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[21]  Ito Wataru,et al.  A general method for introducing a series of mutations into cloned DNA using the polymerase chain reaction. , 1991 .

[22]  William B Dunbar,et al.  Error analysis of idealized nanopore sequencing , 2013, Electrophoresis.

[23]  Andreas Lenz,et al.  Coding over Sets for DNA Storage , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[24]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[25]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[26]  Dumitru Dumcenco,et al.  Identification of single nucleotides in MoS2 nanopores. , 2015, Nature nanotechnology.

[27]  D. Mackay,et al.  Low density parity check codes over GF(q) , 1998, 1998 Information Theory Workshop (Cat. No.98EX131).

[28]  R. Saiki,et al.  A general method of in vitro preparation and specific mutagenesis of DNA fragments: study of protein and DNA interactions. , 1988, Nucleic acids research.

[29]  Jian Ma,et al.  A Rewritable, Random-Access DNA-Based Storage System , 2015, Scientific Reports.

[30]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[31]  A. Kasarskis,et al.  A window into third-generation sequencing. , 2010, Human molecular genetics.

[32]  J. Hagenauer,et al.  Decoding "turbo"-codes with the soft output Viterbi algorithm (SOVA) , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[33]  N. Lennon,et al.  Characterizing and measuring bias in sequence data , 2013, Genome Biology.

[34]  Marc P. C. Fossorier,et al.  On the equivalence between SOVA and max-log-MAP decodings , 1998, IEEE Communications Letters.

[35]  M. Sussman,et al.  Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array , 1999, Nature Biotechnology.