Border Noise Removal of Camera-Captured Document Images Using Page Frame Detection

Camera-captured document images usually contain two main types of marginal noise: textual noise (coming from neighboring pages) and non-textual noise (resulting from the page surrounding and/or binarization process). These types of marginal noise degrade the performance of the preprocessing (dewarping) of camera-captured document images and subsequent document digitization/recognition processes. Page frame detection is one of the newly investigated areas in document image processing, which is used to remove border noise and to identify the actual content area of document images. In this paper, we present a new technique for page frame detection of camera-captured document images. We use text and non-text contents information to find the page frame of document images. We evaluate our algorithm on the DFKI-I (CBDAR 2007 Dewarping Contest) dataset. Experimental results show the effectiveness of our method in comparison to other state-of-the-art page frame detection approaches.

[1]  Christoph H. Lampert,et al.  Document image dewarping using robust estimation of curled text lines , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Thomas M. Breuel,et al.  The Effect of Border Noise on the Performance of Projection-Based Page Segmentation Methods , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Rafael Dueire Lins,et al.  Efficient Removal of Noisy Borders of Monochromatic Documents , 2009, ICIAR.

[4]  Rafael Dueire Lins,et al.  Efficient Removal of Noisy Borders from Monochromatic Documents , 2004, ICIAR.

[5]  Syed Saqib Bukhari,et al.  Ridges Based Curled Textline Region Detection from Grayscale Camera-Captured Document Images , 2009, CAIP.

[6]  Thomas M. Breuel,et al.  Efficient implementation of local adaptive thresholding techniques using integral images , 2008, Electronic Imaging.

[7]  Kuo-Chin Fan,et al.  Marginal noise removal of document images , 2002, Pattern Recognit..

[8]  Ioannis Pratikakis,et al.  A Two-Step Dewarping of Camera Document Images , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[9]  B. Gatos,et al.  Automatic Borders Detection of Camera Document Images , 2007 .

[10]  Syed Saqib Bukhari,et al.  Dewarping of Document Images using Coupled-Snakes , 2009 .

[11]  Faisal Shafait Document Image Dewarping Contest , 2007 .

[12]  Syed Saqib Bukhari,et al.  Improved document image segmentation algorithm using multiresolution morphology , 2011, Electronic Imaging.

[13]  Linlin Zhu,et al.  Skew detection in document images based on rectangular active contour , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[14]  Thomas M. Breuel,et al.  Document cleanup using page frame detection , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[15]  Thomas M. Breuel,et al.  A simple and effective approach for border noise removal from document images , 2009, 2009 IEEE 13th International Multitopic Conference.

[16]  Luigi Cinque,et al.  Segmentation of page images having artifacts of photocopying and scanning , 2002, Pattern Recognit..

[17]  Mohamed S. Kamel,et al.  Image Analysis and Recognition , 2014, Lecture Notes in Computer Science.

[18]  D.X. Le,et al.  Automated borders detection and adaptive segmentation for binary document images , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[19]  Basilios Gatos,et al.  Page frame detection for double page document images , 2010, DAS '10.