Segmentation Based Recovery of Arbitrarily Warped Document Images

Non-linear warping appears in document images when captured by a digital camera or a scanner, especially in the case that these documents are digitized bounded volumes. Arbitrarily warped documents may have several slope changes along the text lines as well as along the words of the same text line. In this paper, a novel segmentation based technique for efficient restoration of arbitrarily warped document images is presented. The proposed technique recovers the documents relying upon (i) text lines and words detection using a novel segmentation technique appropriate for warped documents, (ii) a first draft binary image de-warping based on word rotation and translation according to upper and lower word baselines, and (Hi) a recovery of the original warped image guided by the draft binary image de-warping result. Experimental results on several arbitrarily warped documents prove the effectiveness of the proposed technique.

[1]  Christoph H. Lampert,et al.  Document image dewarping using robust estimation of curled text lines , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  W. Brent Seales,et al.  Image restoration of arbitrarily warped documents , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[4]  Chew Lim Tan,et al.  Correcting document image warping based on regression of curved text lines , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Pierre Baylou,et al.  Active contours network to straighten distorted text lines , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[6]  Gady Agam,et al.  Document Image De-warping for Text/Graphics Recognition , 2002, SSPR/SPR.

[7]  Changsong Liu,et al.  Rectifying the bound document image captured by the camera: a model based approach , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Seiichi Uchida,et al.  Dewarping of document image by global optimization , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[9]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[10]  Chew Lim Tan,et al.  Restoring Warped Document Images through 3D Shape Modeling , 2006, IEEE Trans. Pattern Anal. Mach. Intell..