Merge techniques for large multiple-pass scanned images

The most important step in automatic content conversion is the preprocessing step. Having a very good scanned document is almost a safe bet that the document will have the content extracted with a good confidence level. The current paper describes some preprocessing methods which can be used in large images that must be scanned by pieces because they simply don't fit entirely the scanner area.

[1]  Aurélie Lemaitre,et al.  Using a Neighbourhood Graph Based on Voronoï Tessellation with DMOS, a Generic Method for Structured Document Recognition , 2005, GREC.

[2]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[3]  Laurence Likforman-Sulem,et al.  A Hough based algorithm for extracting text lines in handwritten documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[4]  Jean Camillerapp,et al.  Making handwritten archives documents accessible to public with a generic system of document image analysis , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[5]  Xiaohu Zhang,et al.  Training on severely degraded text-line images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Andrei-Cristian Spataru,et al.  Normalized text font resemblance method aimed at document image page clustering , 2008 .

[7]  Andreas Dengel,et al.  Text-Line Extraction as Selection of Paths in the Neighbor Graph , 1998, Document Analysis Systems.

[8]  Andrei-Cristian Spataru,et al.  Modern approaches in detection of page separators for image clustering , 2008 .

[9]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.