DWT based coding DNA watermarking for DNA copyright protection

Abstract DNA watermarking is a technique for copyright protection and ownership authentication of DNA sequences and ensures the security of private genetic information. This paper addresses issues regarding watermarking DNA coding sequences in the frequency domain that confer mutation resistance, amino acid conservation, and security. Multimedia watermarking is designed for robustness and invisibility mainly based on frequency domain representations. However, frequency domain watermarking for a coding DNA sequence is significantly constrained because the transformation and inverse transformation must be performed while completely conserving the amino acid sequence. In this paper, we present a coding DNA watermarking method in a lifting-based discrete wavelet transform (DWT) domain that focuses on the feasibility of frequency domain watermarking for DNA sequences. Our method divides a coding DNA sequence into a number of subsequences and allocates all codons in subsequences to a numerical code using the histogram ranks of the amino acids. Our method then calculates a set of DWT coefficients for subsequences of synonymous codons and finds a subsequence among them with DWT coefficients that are optimal for embedding watermark bits. Finally, our method substitutes this sequence for a subsequence of codons. To secure the watermark, our method generates the binary watermark based on nonlinear congruential – pseudorandom number generator (NC-PRNG) and randomly selects the embeddable position in the DWT domain of the subsequence. We experimentally verified that our method ensures not only amino acid conservation and security but also resists a point mutation rate of approximately 18.5% point mutations.

[1]  Yu-Tzu Lin,et al.  Rotation, scaling, and translation resilient watermarking for images , 2011 .

[2]  Françoise Argoul,et al.  Multi-scale coding of genomic information: From DNA sequence to genome structure and function , 2011 .

[3]  Katie Moy,et al.  Genomes 3 , 2007, The Yale Journal of Biology and Medicine.

[4]  Pak Chung Wong,et al.  Organic data memory using the DNA approach , 2003, CACM.

[5]  Félix Balado,et al.  On the Shannon capacity of DNA data embedding , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Geoff C. Smith,et al.  Some possible codes for encrypting data in DNA , 2003, Biotechnology Letters.

[7]  M. Kreitman,et al.  Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. , 2009, Molecular biology and evolution.

[8]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[9]  D. Raab,et al.  The GeneOptimizer Algorithm: using a sliding window approach to cope with the vast sequence space in multiparameter DNA sequence optimization , 2010, Systems and Synthetic Biology.

[10]  Massimo Alioto,et al.  A Class of Maximum-Period Nonlinear Congruential Generators Derived From the Rényi Chaotic Map , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[11]  Guozhen Xiao,et al.  Symmetric-key cryptosystem with DNA technology , 2007, Science in China Series F: Information Sciences.

[12]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[13]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[14]  Yanfeng Wang,et al.  An encryption scheme using DNA technology , 2008, 2008 3rd International Conference on Bio-Inspired Computing: Theories and Applications.

[15]  Chih-Chin Lai,et al.  Digital Image Watermarking Using Discrete Wavelet Transform and Singular Value Decomposition , 2010, IEEE Transactions on Instrumentation and Measurement.

[16]  Leonidas D. Iasemidis,et al.  Autoregressive Modeling and Feature Analysis of DNA Sequences , 2004, EURASIP J. Adv. Signal Process..

[17]  D. Heider,et al.  DNA watermarks: A proof of concept , 2008, BMC Molecular Biology.

[18]  Viviana I. Risca DNA-BASED STEGANOGRAPHY , 2001, Cryptologia.

[19]  Masanori Arita,et al.  Secret Signatures Inside Genomic DNA , 2004, Biotechnology progress.

[20]  Rahul Vishwakarma,et al.  HIGH DENSITY DATA STORAGE IN DNA USING AN EFFICIENT MESSAGE ENCODING SCHEME , 2012 .

[21]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[22]  Keshav P. Dahal,et al.  Review on the Advancements of DNA Cryptography , 2010, ArXiv.

[23]  Masahito Yamamoto,et al.  Large-scale DNA memory based on the nested PCR , 2008, Natural Computing.

[24]  M. N. Shanmukha Swamy,et al.  Analysis of Genomics and Proteomics Using DSP Techniques , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[25]  Vinay Kumar Srivastava,et al.  Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time discrete Fourier transform , 2010, 2010 International Conference on Power, Control and Embedded Systems.

[26]  Alessandro Neri,et al.  Visualization and analysis of DNA sequences using DNA walks , 2004, J. Frankl. Inst..

[27]  Yuan-Ting Zhang,et al.  Signal processing techniques in genomic engineering , 2002, Proc. IEEE.

[28]  M. Tomita,et al.  Alignment‐Based Approach for Durable Data Storage into Living Organisms , 2007, Biotechnology progress.

[29]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[30]  Gabor T. Marth,et al.  SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications , 2012, PloS one.

[31]  Maria Dulce Quelhas,et al.  Wavelet analysis of human DNA. , 2011, Genomics.

[32]  Michael Liss,et al.  Embedding Permanent Watermarks in Synthetic Genes , 2012, PloS one.

[33]  Michael S. Waterman,et al.  Computational Genome Analysis: An Introduction , 2007 .

[34]  Changchuan Yin,et al.  Numerical representation of DNA sequences based on genetic code context and its applications in periodicity analysis of genomes , 2008, 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[35]  Colin N. Dewey,et al.  Whole-genome alignment. , 2012, Methods in molecular biology.

[36]  Ingemar J. Cox,et al.  Secure spread spectrum watermarking for multimedia , 1997, IEEE Trans. Image Process..

[37]  Zhiyong Su,et al.  Watermarking 3D CAPD models for topology verification , 2013, Comput. Aided Des..

[38]  Christopher M. Holman Copyright for Engineered DNA: An Idea Whose Time Has Come? , 2010 .

[39]  Calina Popovici,et al.  Aspects of DNA Cryptography , 2010 .

[40]  Félix Balado On the embedding capacity of DNA strands under substitution, insertion, and deletion mutations , 2010, Electronic Imaging.

[41]  Chinchen Chang,et al.  REVERSIBLE DATA HIDING SCHEMES FOR DEOXYRIBONUCLEIC ACID (DNA) MEDIUM , 2007 .

[42]  J. Samuel,et al.  DNA Watermarking of Infectious Agents: Progress and Prospects , 2010, PLoS pathogens.

[43]  Dong-Yup Lee,et al.  Computational codon optimization of synthetic gene for protein expression , 2012, BMC Systems Biology.

[44]  Richard C. T. Lee,et al.  Data hiding methods based upon DNA sequences , 2010, Inf. Sci..

[45]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[46]  Xiao Guo-zhen,et al.  symmetric-key cryptosystem with dna technology , 2007 .

[47]  P D Cristea Conversion of nucleotides sequences into genomic signals , 2002, Journal of cellular and molecular medicine.

[48]  Akimitsu Okamoto,et al.  Public-key system using DNA as a one-way function for key distribution. , 2005, Bio Systems.

[49]  Ewan Birney,et al.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA , 2013, Nature.

[50]  M. Scott McBride,et al.  Bioinformatics and Intellectual Property Protection , 2002 .

[51]  Ramakrishna Ramaswamy,et al.  Wavelet Analysis of DNA Walks , 2006, J. Comput. Biol..

[52]  Dominik Heider,et al.  Watermarking sexually reproducing diploid organisms , 2008, Bioinform..

[53]  David Haughton,et al.  A modified watermark synchronisation code for robust embedding of data in DNA , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Dominik Heider,et al.  DNA-based watermarks using the DNA-Crypt algorithm , 2007, BMC Bioinformatics.

[55]  Dominik Heider,et al.  DNA watermarks in non-coding regulatory sequences , 2009, BMC Research Notes.

[56]  Miodrag Potkonjak,et al.  Hiding Data in DNA , 2002, Information Hiding.

[57]  Catherine Taylor Clelland,et al.  Hiding messages in DNA microdots , 1999, Nature.

[58]  O. Babatunde Deoxyribonucleic acid (DNA) as a hypothetical information hiding medium: DNA mimics basic information security protocol , 2011 .

[59]  Paul de Figueiredo,et al.  Genomic Polymorphisms as Inherent Watermarks for Tracking Infectious Agents , 2010, Front. Microbio..

[60]  T. Kunkel DNA Replication Fidelity* , 2004, Journal of Biological Chemistry.

[61]  Masanori Arita,et al.  Writing Information into DNA , 2004, Aspects of Molecular Computing.

[62]  Fabien A. P. Petitcolas,et al.  Revised Papers from the 5th International Workshop on Information Hiding , 2002 .

[63]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[64]  John H. Reif,et al.  DNA-based Cryptography , 1999, Aspects of Molecular Computing.

[65]  Dominik Heider and Angelika Barnekow DNA Watermarking: Challenging Perspectives for Biotechnological Applications , 2011 .

[66]  David Haughton,et al.  Repetition Coding as an Effective Error Correction Code for Information Encoded in DNA , 2011, 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering.

[67]  George Karypis,et al.  Pareto Optimal Pairwise Sequence Alignment , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[68]  M. Tomita,et al.  Stabilizing synthetic data in the DNA of living organisms , 2008, Systems and Synthetic Biology.

[69]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[70]  Shuhong Jiao,et al.  Code for encryption hiding data into genomic DNA of living organisms , 2008, 2008 9th International Conference on Signal Processing.

[71]  Chris Bailey-Kellogg,et al.  Improved Multiple Sequence Alignments Using Coupled Pattern Mining , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[72]  Monica Borda,et al.  DNA secret writing techniques , 2010, 2010 8th International Conference on Communications.

[73]  David Haughton,et al.  Performance of DNA data embedding algorithms under substitution mutations , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[74]  Guillaume Durand,et al.  HyDEn: A Hybrid Steganocryptographic Approach for Data Encryption Using Randomized Error-Correcting DNA Codes , 2013, BioMed research international.

[75]  Lei Qin,et al.  Asymmetric encryption and signature method with DNA technology , 2010, Science China Information Sciences.

[76]  Hong Wang,et al.  Information hiding based on DNA steganography , 2013, 2013 IEEE 4th International Conference on Software Engineering and Service Science.

[77]  Hon Keung Kwan,et al.  Numerical representation of DNA sequences , 2009, 2009 IEEE International Conference on Electro/Information Technology.

[78]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[79]  Wenjun Zeng,et al.  Image-adaptive watermarking using visual models , 1998, IEEE J. Sel. Areas Commun..

[80]  A Leier,et al.  Cryptography with DNA binary strands. , 2000, Bio Systems.

[81]  Timothy B. Stockwell,et al.  Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome , 2008, Science.

[82]  Brian M. Gaff,et al.  Protecting Bioinformatics as Intellectual Property , 2013, Computer.

[83]  Abraham B. Korol,et al.  Minimal-dot plot: "Old tale in new skin" about sequence comparison , 2011, Inf. Sci..

[84]  David Haughton,et al.  BioCode: Two biologically compatible Algorithms for embedding data in non-coding and coding regions of DNA , 2012, BMC Bioinformatics.

[85]  M. Miyamoto,et al.  Sequence alignments and pair hidden Markov models using evolutionary history. , 2003, Journal of molecular biology.

[86]  Ari Löytynoja,et al.  Alignment methods: strategies, challenges, benchmarking, and comparative overview. , 2012, Methods in molecular biology.

[87]  T E Karakasidis,et al.  A study of entropy/clarity of genetic sequences using metric spaces and fuzzy sets. , 2010, Journal of theoretical biology.

[88]  Willem P.C. Stemmer How to publish DNA sequences with copyright protection , 2002, Nature Biotechnology.