A biologically constrained encoding solution for long-term storage of images onto synthetic DNA

Living in the age of the digital media explosion, the amount of data that is being stored increases dramatically. However, even if existing storage systems suggest efficiency in capacity, they are lacking in durability. Hard disks, flash, tape or even optical storage have limited lifespan in the range of 5 to 20 years. Interestingly, recent studies have proven that it was possible to use synthetic DNA for the storage of digital data, introducing a strong candidate to achieve data longevity. The DNA’s biological properties allows the storage of a great amount of information into an extraordinary small volume while also promising efficient storage for centuries or even longer with no loss of information. However, encoding digital data onto DNA is not obvious, because when decoding, we have to face the problem of sequencing noise robustness. Furthermore, synthesizing DNA is an expensive process and thus, controlling the compression ratio by optimizing the rate-distortion trade-off is an important challenge we have to deal with. This work proposes a coding solution for the storage of digital images onto synthetic DNA. We developed a new encoding algorithm which generates a DNA code robust to biological errors coming from the synthesis and the sequencing processes. Furthermore, thanks to an optimized allocation process the solution is able to control the compression ratio and thus the length of the synthesized DNA strand. Results show an improvement in terms of coding potential compared to previous state-of-the-art works.