Blind Source Separation Techniques for Detecting Hidden Texts and Textures in Document Images

Blind Source Separation techniques, based both on Independent Component Analysis and on second order statistics, are presented and compared for extracting partially hidden texts and textures in document images. Barely perceivable features may occur, for instance, in ancient documents previously erased and then re-written (palimpsests), or for transparency or seeping of ink from the reverse side, or from watermarks in the paper. Detecting these features can be of great importance to scholars and historians. In our approach, the document is modeled as the superposition of a number of source patterns, and a simplified linear mixture model is introduced for describing the relationship between these sources and multispectral views of the document itself. The problem of detecting the patterns that are barely perceivable in the visible color image is thus formulated as the one of separating the various patterns in the mixtures. Some examples from an extensive experimentation with real ancient documents are shown and commented.