Document Image binarisation Using a Supervised Neural Network

Advances in digital technologies have allowed us to generate more images than ever. Images of scanned documents are examples of these images that form a vital part in digital libraries and archives. Scanned degraded documents contain background noise and varying contrast and illumination, therefore, document image binarisation must be performed in order to separate foreground from background layers. Image binarisation is performed using either local adaptive thresholding or global thresholding; with local thresholding being generally considered as more successful. This paper presents a novel method to global thresholding, where a neural network is trained using local threshold values of an image in order to determine an optimum global threshold value which is used to binarise the whole image. The proposed method is compared with five local thresholding methods, and the experimental results indicate that our method is computationally cost-effective and capable of binarising scanned degraded documents with superior results.

[1]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Michael Egmont-Petersen,et al.  Image processing with neural networks - a review , 2002, Pattern Recognit..

[3]  Julie Delon,et al.  A Nonparametric Approach for Histogram Segmentation , 2007, IEEE Transactions on Image Processing.

[4]  Nikos Papamarkos,et al.  Optimal combination of document binarization techniques using a self-organizing map neural network , 2007, Eng. Appl. Artif. Intell..

[5]  Ergina Kavallieratou,et al.  Cleaning and Enhancing Historical Document Images , 2005, ACIVS.

[6]  A. Khashman,et al.  Novel Thresholding Method for Document Analysis , 2006, 2006 IEEE International Conference on Industrial Technology.

[7]  Yan Chen,et al.  Comparison of some thresholding algorithms for text/background segmentation in difficult document images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Antoine Tabbone,et al.  Combining Global and Local Threshold to Binarize Document of Images , 2005, IbPRIA.

[10]  Giovanni Soda,et al.  Artificial neural networks for document analysis and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[12]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[13]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[14]  Yaguang Yang,et al.  A text image enhancement system based on segmentation and classification methods , 2004, HDP '04.

[15]  Kenji Suzuki,et al.  Neural Edge Enhancer for Supervised Edge Enhancement from Noisy Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[17]  K. W. Wong,et al.  A two-stage binarization approach for document images , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[18]  Banshidhar Majhi,et al.  FLANN-based adaptive threshold selection for detection of impulsive noise in images , 2007 .

[19]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[20]  Zheru Chi,et al.  Combined thresholding and neural network approach for vein pattern extraction from leaf images , 2006 .

[21]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[22]  Ioannis Pratikakis,et al.  An Adaptive Binarization Technique for Low Quality Historical Documents , 2004, Document Analysis Systems.

[23]  Rae-Hong Park,et al.  Document image binarization based on topographic analysis using a water flow model , 2002, Pattern Recognit..

[24]  Ahmed S. Abutableb Automatic thresholding of gray-level pictures using two-dimensional entropy , 1989 .

[25]  Ahmed S. Abutaleb,et al.  Automatic thresholding of gray-level pictures using two-dimensional entropy , 1989, Comput. Vis. Graph. Image Process..

[26]  Charalambos Strouthopoulos,et al.  Multithresholding of color and gray-level images through a neural network technique , 2000, Image Vis. Comput..

[27]  J. R. Parker,et al.  Gray Level Thresholding in Badly Illuminated Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Salvador España Boquera,et al.  Enhancement and Cleaning of Handwritten Data by Using Neural Networks , 2005, IbPRIA.

[29]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[30]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..

[31]  Y. Mitsukura,et al.  Neural Network Based Threshold Determination for Malaysia License Plate Character Recognition , 2006 .

[32]  Ching Y. Suen,et al.  Stroke-model-based character extraction from gray-level document images , 2001, IEEE Trans. Image Process..