An Iterative Refinement Framework for Image Document Binarization with Bhattacharyya Similarity Measure

Background noise and illumination condition are two primary factors degrading the performance of document image binarization. In this paper, we propose an iterative refinement framework to support robust binarization. Initially, an input image is transformed into a Bhattacharyya similarity matrix with Gaussian kernel, which is subsequently converted into a binary image using maximum entropy classifier. Then, we adopt the run-length histogram to estimate the character stroke width, an important indicator to determine the length of filter window. After noise elimination, the output image is used for the next round of refinement and the process terminates when the estimated stroke width is stable. Extensive experiments were conducted on the standard DIBCO datasets as well as a new benchmark harvested from our user query log. Results show that our proposed method outperforms state-of-the-art methods and is more robust to handle low-quality images.