Effective geometric restoration of distorted historical document for large-scale digitisation

Due to storage conditions and material’s non-planar shape, geometric distortion of the 2-D content is widely present in scanned document images. Effective geometric restoration of these distorted document images considerably increases character recognition rate in large-scale digitisation. For large-scale digitisation of historical books, geometric restoration solutions expect to be accurate, generic, robust, unsupervised and reversible. However, most methods in the literature concentrate on improving restoration accuracy for specific distortion effect, but not their applicability in large-scale digitisation. This paper proposes an effective mesh based geometric restoration system, (GRLSD), for large-scale distorted historical document digitisation. In this system, an automatic mesh generation based dewarping tool is proposed to geometrically model and correct arbitrary warping historical documents. An XML based mesh recorder is proposed to record the mesh of distortion information for reversible use. A graphic user interface toolkit is designed to visually display and manually manipulate the mesh for improving geometric restoration accuracy. Experimental results show that the proposed automatic dewarping approach efficiently corrects arbitrarily warped historical documents, with an improved performance over several state-of-the-art geometric restoration methods. By using XML mesh recorder and GUI toolkit, the GRLSD system greatly aids users to flexibly monitor and correct ambiguous points of mesh for the prevention of damaging historical document images without distortions in large-scale digitalisation.

[1]  Gordon Clapworthy,et al.  GSWO: A programming model for GPU-enabled parallelization of sliding window operations in image processing , 2016, Signal Process. Image Commun..

[2]  Apostolos Antonacopoulos,et al.  Grid-based modelling and correction of arbitrarily warped historical document images for large-scale digitisation , 2011, HIP '11.

[3]  Vassilis Katsouros,et al.  A Morphological Approach for Text-Line Segmentation in Handwritten Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[4]  Apostolos Antonacopoulos,et al.  A robust hybrid approach for text line segmentation in historical documents , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[5]  Gaofeng Meng,et al.  Metric Rectification of Curved Document Images , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chew Lim Tan,et al.  Recovery of distorted document images from bound volumes , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Chew Lim Tan,et al.  Correcting document image warping based on regression of curved text lines , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Yu Zhang,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 an Improved Physically-based Method for Geometric Restoration of Distorted Document Images , 2007 .

[9]  Chew Lim Tan,et al.  Straightening warped text lines using polynomial regression , 2002, Proceedings. International Conference on Image Processing.

[10]  George D. C. Cavalcanti,et al.  Text Line Segmentation Based on Morphology and Histogram Projection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Bin Fu,et al.  A Model Based Book Dewarping Method to Handle 2D Images Captured by a Digital Camera , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[12]  Yuan Yan Tang,et al.  Image transformation approach to nonlinear shape restoration , 1993, IEEE Trans. Syst. Man Cybern..

[13]  Alain Bouju,et al.  Former books digital processing: image warping , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).

[14]  Michael S. Brown,et al.  Geometric and shading correction for images of printed materials using boundary , 2006, IEEE Transactions on Image Processing.

[15]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[17]  Pierre Baylou,et al.  Active contours network to straighten distorted text lines , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[18]  Ioannis Pratikakis,et al.  Goal-Oriented Rectification of Camera-Based Document Images , 2011, IEEE Transactions on Image Processing.

[19]  Shijian Lu,et al.  Document Flattening through Grid Modeling and Regularization , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[20]  Atsushi Yamashita,et al.  Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[21]  Chew Lim Tan,et al.  Restoring Warped Document Images through 3D Shape Modeling , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  W. Brent Seales,et al.  Image restoration of arbitrarily warped documents , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Apostolos Antonacopoulos,et al.  Document image analysis for World War II personal records , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[24]  Syed Saqib Bukhari,et al.  Dewarping of Document Images using Coupled-Snakes , 2009 .

[25]  Ioannis Pratikakis,et al.  Performance evaluation methodology for document image dewarping techniques , 2012 .