论文信息 - iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

End-to-end Optical Character Recognition (OCR) systems are heavily used to convert document images into machine-readable text. Commercial and open-source OCR systems (like Abbyy, OCRopus, Tesseract etc.) have traditionally been optimized for contemporary documents like books, letters, memos, and other end-user documents. However, these systems are difficult to use equally well for digitizing historical document images, which contain degradations like non-uniform shading, bleed-through, and irregular layout; such degradations usually do not exist in contemporary document images. The open-source anyOCR is an end-to-end OCR pipeline, which contains state-of-the-art techniques that are required for digitizing degraded historical archives with high accuracy. However, high accuracy comes at a cost of high computational complexity that results in 1) long runtime that limits digitization of big collection of historical archives and 2) high energy consumption that is the most critical limiting factor for portable devices with constrained energy budget. Therefore, we are targeting energy efficient and high throughput acceleration of the anyOCR pipeline. Generalpurpose computing platforms fail to meet these requirements that makes custom hardware design mandatory. In this paper, we are presenting a new concept named iDocChip. It is a portable hybrid hardware-software FPGA-based accelerator that is characterized by low footprint meaning small size, high power efficiency that will allow using it in portable devices, and high throughput that will make it possible to process big collection of historical archives in real time without effecting the accuracy. In this paper, we focus on binarization, which is the second most critical step in the anyOCR pipeline after text-line recognizer that we have already presented in our previous publication [21]. The anyOCR system makes use of a Percentile Based Binarization method that is suitable for overcoming degradations like non-uniform shading and bleed-through. To the best of our knowledge, we propose the first hardware architecture of the PBB technique. Based on the new architecture, we present a hybrid hardware-software FPGA-based accelerator that outperforms the existing anyOCR software implementation running on i7-4790T in terms of runtime by factor of 21, while achieving energy efficiency of 10 Images/J that is higher than that achieved by low power embedded processors with negligible loss of recognition accuracy.

[1] Andreas Dengel,et al. anyOCR: An Open-Source OCR System for Historical Archives , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2] Didier Stricker,et al. A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic , 2015, Electronic Imaging.

[3] Yung-Sheng Chen,et al. Adaptive thresholding algorithm and its hardware implementation , 1994, Pattern Recognit. Lett..

[4] Michael J. Fischer,et al. The String-to-String Correction Problem , 1974, JACM.

[5] Efstathios Stamatatos,et al. Adaptive Binarization of Historical Document Images , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6] Norbert Wehn,et al. Hardware architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[7] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[8] M. Hassan Najafi,et al. A Fast Fault-Tolerant Architecture for Sauvola Local Image Thresholding Algorithm Using Stochastic Computing , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9] L. Álvarez,et al. Signal and image restoration using shock filters and anisotropic diffusion , 1994 .

[10] Matti Pietikäinen,et al. Adaptive document binarization , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[11] Syed Saqib Bukhari,et al. Robust Binarization of Stereo and Monocular Document Images Using Percentile Filter , 2013, CBDAR.

[12] Andy C. Downton,et al. A comparison of binarization methods for historical archive documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[13] Rahul Sharma,et al. Parallel Implementation of Souvola’s Binarization Approach on GPU , 2011 .

[14] Brij Mohan Singh,et al. Parallel Implementation of Otsu’s Binarization Approach on GPU , 2011 .

[15] Nader Karimi,et al. Hardware design for binarization and thinning of fingerprint images , 2017, ArXiv.

[16] Naresh Kumar Garg,et al. Binarization Techniques used for Grey Scale Images , 2013 .

[17] Didier Stricker,et al. Binarization-free OCR for historical documents using LSTM networks , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[18] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[19] Thomas M. Breuel,et al. High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[20] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[21] Nicole Vincent,et al. Comparison of Niblack inspired binarization methods for ancient documents , 2009, Electronic Imaging.