A modified watermark synchronisation code for robust embedding of data in DNA

DNA data embedding is a newly emerging field aspiring to encode data in deoxyribonucleic acid (DNA). DNA is an inherently digital and noisy medium, undergoing substitution, insertion and deletion mutations. Hence, encoding information in DNA can be seen as a particular case of digital communications in which biological constraints must be observed. In this paper we propose a modification of Davey and MacKay's watermark synchronisation code (unrelated to digital watermarking) to create an encoding procedure more biocompatible with the host organism than previous methods. In addition, when combined with a low density parity check (LDPC) code, the method provides near-optimum error correction. We also obtain the theoretical embedding capacity of DNA under substitution mutations for the increased biocompatibility constraint. This result, along with an existing bound on capacity for insertion and deletion mutations, is compared to the proposed algorithm's performance by means of Monte Carlo simulations.

[1]  Miodrag Potkonjak,et al.  Hiding Data in DNA , 2002, Information Hiding.

[2]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[3]  M.C. Davey,et al.  Watermark codes: reliable communication over insertion/deletion channels , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[4]  G. Church,et al.  Next-Generation Digital Information Storage in DNA , 2012, Science.

[5]  M. Tomita,et al.  Stabilizing synthetic data in the DNA of living organisms , 2008, Systems and Synthetic Biology.

[6]  M. Kreitman,et al.  Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. , 2009, Molecular biology and evolution.

[7]  John H. Reif,et al.  DNA-based Cryptography , 1999, Aspects of Molecular Computing.

[8]  David Haughton,et al.  Repetition Coding as an Effective Error Correction Code for Information Encoded in DNA , 2011, 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering.

[9]  M. Tomita,et al.  Alignment‐Based Approach for Durable Data Storage into Living Organisms , 2007, Biotechnology progress.

[10]  Michael Mitzenmacher,et al.  A Survey of Results for Deletion Channels and Related Synchronization Channels , 2008, SWAT.

[11]  Félix Balado On the embedding capacity of DNA strands under substitution, insertion, and deletion mutations , 2010, Electronic Imaging.

[12]  David Haughton,et al.  BioCode: Two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA , 2012, BMC Bioinformatics.

[13]  J. Samuel,et al.  DNA Watermarking of Infectious Agents: Progress and Prospects , 2010, PLoS pathogens.

[14]  Rolando Carrasco,et al.  Non-Binary Error Control Coding for Wireless Communication and Data Storage , 2008 .

[15]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[16]  Pak Chung Wong,et al.  Organic data memory using the DNA approach , 2003, CACM.

[17]  Félix Balado,et al.  Capacity of DNA Data Embedding Under Substitution Mutations , 2011, IEEE Transactions on Information Theory.

[18]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[19]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[20]  Gaurav Sharma,et al.  Watermark Synchronization for Feature-Based Embedding: Application to Speech , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[21]  Dominik Heider,et al.  DNA-based watermarks using the DNA-Crypt algorithm , 2007, BMC Bioinformatics.