On the embedding capacity of DNA strands under substitution, insertion, and deletion mutations

A number of methods have been proposed over the last decade for embedding information within deoxyribonucleic acid (DNA). Since a DNA sequence is conceptually equivalent to a unidimensional digital signal, DNA data embedding (diversely called DNA watermarking or DNA steganography) can be seen either as a traditional communications problem or as an instance of communications with side information at the encoder, similar to data hiding. These two cases correspond to the use of noncoding or coding DNA hosts, which, respectively, denote DNA segments that cannot or can be translated into proteins. A limitation of existing DNA data embedding methods is that none of them have been designed according to optimal coding principles. It is not possible either to evaluate how close to optimality these methods are without determining the Shannon capacity of DNA data embedding. This is the main topic studied in this paper, where we consider that DNA sequences may be subject to substitution, insertion, and deletion mutations.

[1]  J P Cox,et al.  Long-term data storage in DNA. , 2001, Trends in biotechnology.

[2]  A Leier,et al.  Cryptography with DNA binary strands. , 2000, Bio Systems.

[3]  Geoff C. Smith,et al.  Some possible codes for encrypting data in DNA , 2003, Biotechnology Letters.

[4]  Christian Cachin,et al.  An information-theoretic model for steganography , 1998, Inf. Comput..

[5]  Kannan Ramchandran,et al.  Duality between source coding and channel coding and its extension to the side information case , 2003, IEEE Trans. Inf. Theory.

[6]  Viviana I. Risca DNA-BASED STEGANOGRAPHY , 2001, Cryptologia.

[7]  Masanori Arita,et al.  Secret Signatures Inside Genomic DNA , 2004, Biotechnology progress.

[8]  David J. C. MacKay,et al.  Reliable communication over channels with insertions, deletions, and substitutions , 2001, IEEE Trans. Inf. Theory.

[9]  Chinchen Chang,et al.  REVERSIBLE DATA HIDING SCHEMES FOR DEOXYRIBONUCLEIC ACID (DNA) MEDIUM , 2007 .

[10]  Y X Fu,et al.  Estimating mutation rate and generation time from longitudinal samples of DNA sequences. , 2001, Molecular biology and evolution.

[11]  Pak Chung Wong,et al.  Organic data memory using the DNA approach , 2003, CACM.

[12]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[13]  W. Wayt Gibbs,et al.  The unseen genome: gems among the junk. , 2003, Scientific American.

[14]  Masanori Arita,et al.  Writing Information into DNA , 2004, Aspects of Molecular Computing.

[15]  John H. Reif,et al.  DNA-based Cryptography , 1999, Aspects of Molecular Computing.

[16]  Michael S. Waterman,et al.  Computational Genome Analysis: An Introduction , 2007 .

[17]  Gregory W. Wornell,et al.  The duality between information embedding and source coding with side information and some applications , 2003, IEEE Trans. Inf. Theory.

[18]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[19]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[20]  Dominik Heider,et al.  Watermarking sexually reproducing diploid organisms , 2008, Bioinform..

[21]  Miodrag Potkonjak,et al.  Hiding Data in DNA , 2002, Information Hiding.

[22]  Catherine Taylor Clelland,et al.  Hiding messages in DNA microdots , 1999, Nature.

[23]  T. Kunkel DNA Replication Fidelity* , 2004, Journal of Biological Chemistry.

[24]  Dominik Heider,et al.  DNA-based watermarks using the DNA-Crypt algorithm , 2007, BMC Bioinformatics.

[25]  Dominik Heider,et al.  DNA watermarks in non-coding regulatory sequences , 2009, BMC Research Notes.

[26]  Rebecca S. Eisenberg,et al.  Structure and function in gene patenting , 1997, Nature Genetics.

[27]  Masanori Arita,et al.  Comma-free design for DNA words , 2004, CACM.

[28]  M. Tomita,et al.  Alignment‐Based Approach for Durable Data Storage into Living Organisms , 2007, Biotechnology progress.

[29]  Modegi-T Watermark Embedding Techniques for DNA Sequences Using Codon Usage Bias Features , 2005 .

[30]  Prachi Patel Advance in Nanopore Gene Sequencing , 2009, IEEE Spectrum.

[31]  Max H. M. Costa,et al.  Writing on dirty paper , 1983, IEEE Trans. Inf. Theory.

[32]  D. Heider,et al.  DNA watermarks: A proof of concept , 2008, BMC Molecular Biology.