Automatic Borders Detection of Camera Document Images

When capturing a document using a digital camera, the resulting document image is often framed by a noisy black border or includes noisy text regions from neighbouring pages. In this paper, we present a novel technique for enhancing the document images captured by a digital camera by automatically detecting the document borders and cutting out noisy black borders as well as noisy text regions appearing from neighbouring pages. Our methodology is based on projection profiles combined with a connected component labelling process. Signal cross-correlation is also used in order to verify the detected noisy text areas. Experimental results on several camera document images, mainly historical, documents indicate the effectiveness of the proposed technique.

[1]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[2]  Matti Pietikäinen,et al.  Page segmentation and classification using fast feature extraction and connectivity analysis , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  D.X. Le,et al.  Automated borders detection and adaptive segmentation for binary document images , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[4]  B. GATOS,et al.  Skew detection and text line position determination in digitized documents , 1997, Pattern Recognit..

[5]  Kuo-Chin Fan,et al.  Marginal noise removal of document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Rafael Dueire Lins,et al.  A new algorithm for removing noisy borders from monochromatic documents , 2004, SAC '04.

[7]  Chun-Jen Chen,et al.  A linear-time component-labeling algorithm using contour tracing technique , 2004, Comput. Vis. Image Underst..

[8]  Rafael Dueire Lins,et al.  Efficient Removal of Noisy Borders from Monochromatic Documents , 2004, ICIAR.

[9]  Ioannis Pratikakis,et al.  A segmentation-free approach for keyword search in historical typewritten documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[10]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..