Adaptive document image thresholding using foreground and background clustering

Two algorithms for document image thresholding are presented, that are suitable for scanning document images at high-speed. They are designed to operate on a portion of the image while scanning the document, thus, they fit a pipeline architecture and lend themselves to real-time implementation. The first algorithm is based on adaptive thresholding and uses local edge information to switch between global thresholding and adaptive local thresholding determined from the statistics of a local image window. The second thresholding algorithm is based on tracking the foreground and background levels using clustering based on a variant of the K-means algorithm. The two approaches may be used independently or may be combined for improved performance. Results are presented illustrating the algorithms' performance for document and pictorial images.

[1]  Azriel Rosenfeld,et al.  Threshold Evaluation Techniques , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[3]  Sargur N. Srihari,et al.  Document Image Binarization: Evaluation Of Algorithms , 1986, Optics & Photonics.

[4]  Robert Ulichney,et al.  Digital Halftoning , 1987 .

[5]  P.K Sahoo,et al.  A survey of thresholding techniques , 1988, Comput. Vis. Graph. Image Process..

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  Mohamed S. Kamel,et al.  Extraction of Binary Character/Graphics Images from Grayscale Document Images , 1993, CVGIP Graph. Model. Image Process..

[8]  Yung-Sheng Chen,et al.  Adaptive thresholding algorithm and its hardware implementation , 1994, Pattern Recognit. Lett..

[9]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Matti Pietikäinen,et al.  Adaptive document binarization , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[11]  Bülent Sankur,et al.  The performance evaluation of thresholding algorithms for optical character recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.