Data hiding in hard-copy text documents robust to print, scan and photocopy operations

This paper describes a method for hiding data inside printed text documents that is resilient to print/scan and photocopying operations. Using the principle of channel coding with side information, the embedder inserts a message into a text document while treating the content of the document as known interference. The data is embedded by making small changes to text characters before the document is printed. Using a simple correlation-based detector in conjunction with an error correction code, the hidden data can be extracted from a photocopy of the printed document. By enhancing the detector with an optical character recognition algorithm, the embedded data can be extracted even after multiple rounds of photocopying. Results from subjective tests show that the changes made by the embedding algorithm, while perceptible, are not obtrusive to a lay reader.