H-DocPro: a document image processing platform for historical documents

In this paper, we introduce the H-DocPro platform which is a publicly available document image processing platform for historical documents. H-DocPro is a result of our recent and ongoing research on historical document image processing and has been developed in order to monitor the successive application of several new or state-of-the-art document image processing methods. It is an open architecture software platform that permits several document image processing modules and methods (e.g. binarization, image enhancement, page split) to be utilized in an easy to define processing workflow. We provide detailed information on how to use H-DocPro, the available modules and methods as well as the way one can add his own components exploiting the open architecture form of the platform. Representative examples and experimental results using large sets of historical document images demonstrate the efficiency of H-DocPro methods.

[1]  B. Gatos,et al.  Automatic Borders Detection of Camera Document Images , 2007 .

[2]  D.X. Le,et al.  Automated borders detection and adaptive segmentation for binary document images , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[3]  Thierry Géraud,et al.  The SCRIBO Module of the Olena Platform: A Free Software Framework for Document Image Analysis , 2011, 2011 International Conference on Document Analysis and Recognition.

[4]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[5]  Guanglai Gao,et al.  A keyword retrieval system for historical Mongolian document images , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[6]  Ioannis Pratikakis,et al.  Goal-Oriented Rectification of Camera-Based Document Images , 2011, IEEE Transactions on Image Processing.

[7]  Basilios Gatos,et al.  Page frame detection for double page document images , 2010, DAS '10.

[8]  Rafael Dueire Lins,et al.  HistDoc v. 2.0: enhancing a platform to process historical documents , 2011, HIP '11.

[9]  Ioannis Pratikakis,et al.  Performance evaluation methodology for document image dewarping techniques , 2012 .

[10]  Ioannis Pratikakis,et al.  Performance Evaluation Methodology for Historical Document Image Binarization , 2013, IEEE Transactions on Image Processing.

[11]  Nicole Vincent,et al.  Towards historical document indexing: extraction of drop cap letters , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[12]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.