Virtual restoration and content analysis of ancient degraded manuscripts

In recent years, extensive campaigns of digitization of the documental heritage conserved in libraries and archives have been performed, with the primary goal to ensure the preservation and fruition of this important part of the human cultural and historical patrimony. Besides protecting conservation, the availability of high quality digital copies has increasingly stimulated the use of image processing techniques, to perform a number of operations on documents and manuscripts, without harming the often precious and fragile originals. Among those, virtual restoration tasks are crucial, as they facilitate the traditional work of philologists and paleographers, and constitute a first step towards an automatic analysis of the written contents. Here we report our experience in this field, referring, as a case study, to the problem of removing one of the most frequent and impairing degradations affecting ancient manuscripts, i.e., the bleed-through distortion. We show that techniques of blind source separation are versatile tools to either cancel these unwanted interferences or isolate specific features for content analysis goals. Specialized algorithms, based on recto-verso models and sparse image representation, are then shown to be able to perform a fine and selective removal of the degradation, while preserving the original appearance of the manuscript.

[1]  Mohamed Cheriet,et al.  A Variational Approach to Degraded Document Enhancement , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Anna Tonazzini,et al.  Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[3]  Andriy Myronenko,et al.  Intensity-Based Image Registration by Minimizing Residual Complexity , 2010, IEEE Transactions on Medical Imaging.

[4]  Michael S. Brown,et al.  Ink-bleed reduction using functional minimization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Anna Tonazzini,et al.  Digital image analysis to enhance underwritten text in the Archimedes palimpsest , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[6]  Anna Tonazzini,et al.  A non-stationary density model to separate overlapped texts in degraded documents , 2015, Signal Image Video Process..

[7]  Xiao-Ping Zhang,et al.  Blind Bleed-Through Removal for Scanned Historical Document Image With Conditional Random Fields , 2015, IEEE Transactions on Image Processing.

[8]  Anna Tonazzini,et al.  Digital restoration of ancient color manuscripts from geometrically misaligned recto-verso pairs , 2016 .

[9]  Boaz Ophir,et al.  Show-Through Cancellation in Scanned Images using Blind Source Separation Techniques , 2007, 2007 IEEE International Conference on Image Processing.

[10]  Miki Haseyama,et al.  Image inpainting based on sparse representations with a perceptual metric , 2013, EURASIP J. Adv. Signal Process..

[11]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[12]  Michael S. Brown,et al.  Accurate Alignment of Double-Sided Manuscripts for Bleed-Through Removal , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[13]  Farnood Merrikh-Bayat,et al.  Using Non-Negative Matrix Factorization for Removing Show-Through , 2010, LVA/ICA.

[14]  Rabeux Vincent,et al.  Document Recto-verso Registration Using a Dynamic Time Warping Algorithm , 2011, 2011 International Conference on Document Analysis and Recognition.

[15]  Anna Tonazzini,et al.  Registration and Enhancement of Double-Sided Degraded Manuscripts Acquired in Multispectral Modality , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[16]  Christian Wolf,et al.  Document Ink Bleed-Through Removal with Two Hidden Markov Random Fields and a Single Observation Field , 2010, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Chew Lim Tan,et al.  Matching of double-sided document images to remove interference , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  Anil C. Kokaram,et al.  Nonrigid recto-verso registration using page outline structure and content preserving warps , 2013, HIP '13.

[19]  Anil C. Kokaram,et al.  Bleed-through removal in degraded documents , 2012, Electronic Imaging.

[20]  Michael S. Brown,et al.  User-Assisted Ink-Bleed Reduction , 2010, IEEE Transactions on Image Processing.

[21]  Anil C. Kokaram,et al.  A Non-parametric Framework for Document Bleed-through Removal , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Anna Tonazzini,et al.  Blind Source Separation Techniques for Detecting Hidden Texts and Textures in Document Images , 2004, ICIAR.

[23]  Frank Lebourgeois,et al.  Restoring Ink Bleed-Through Degraded Document Images Using a Recursive Unsupervised Classification Technique , 2006, Document Analysis Systems.

[24]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.

[25]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[26]  Bin Li,et al.  Multi-sensor image registration based on algebraic projective invariants. , 2013, Optics express.

[27]  Gaurav Sharma,et al.  Show-through cancellation in scans of duplex printed documents , 2001, IEEE Trans. Image Process..

[28]  Anna Tonazzini,et al.  Non-Local Sparse Image Inpainting for Document Bleed-Through Removal , 2018, J. Imaging.

[29]  Andrzej Cichocki,et al.  Adaptive blind signal and image processing , 2002 .

[30]  Anna Tonazzini,et al.  Nonlinear model identification and see-through cancelation from recto–verso data , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[31]  Anna Tonazzini,et al.  Independent component analysis for document restoration , 2004, Document Analysis and Recognition.