SDK Reinvented: Document Image Analysis Methods as RESTful Web Services

Document Image Analysis (DIA) systems become ever more advanced, but also more complex -- computationally, and logically. This increases the difficulty of integrating existing state-of-the-art approaches into new research or into practical workflows. The current approach to sharing software is publishing source code -- leaving the burden to the integrator -- or creating a Software Development Kit (SDK) which is often restricted to one programming language. We present DIVAServices a framework for sharing and accessing DIA methods within the research community and beyond. Using a RESTful web service architecture we provide access to the methods, leading to only one system on which the binaries of methods need to be maintained. All it takes for a developer to use an algorithm is a simple HTTP request with the image data and parameters for the method and they will receive the computed results in a format that allows for seamless integration into any kind of workflow or for further processing. Furthermore, DIVAServices is open-source, enabling other research groups or libraries to host their own instance in their environment. Using this framework, future DIA systems can be built on the shoulders of well tested algorithms, accessible to everyone.

[1]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[2]  Daniel P. Lopresti,et al.  Document analysis research in the year 2021 , 2011, IEA/AIE'11.

[3]  Jonathon S. Hare,et al.  OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images , 2011, MM '11.

[4]  Daniel P. Lopresti,et al.  An Open Architecture for End-to-End Document Analysis Benchmarking , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  Shreyas Cholia,et al.  NEWT: A RESTful service for building High Performance Computing web applications , 2010, 2010 Gateway Computing Environments Workshop (GCE).

[7]  Ernest Valveny,et al.  A general framework for the evaluation of symbol recognition methods , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[9]  Rodrigo Lopez,et al.  A new bioinformatics analysis tools framework at EMBL–EBI , 2010, Nucleic Acids Res..

[10]  Sabine Süsstrunk,et al.  Seam Carving for Text Line Extraction on Color and Grayscale Historical Manuscripts , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[11]  Bart Lamiroy,et al.  Interpretation, Evaluation and the Semantic Gap ... What if We Were on a Side-Track? , 2013, GREC.

[12]  Dorothea Blostein,et al.  A survey of document image classification: problem statement, classifier architecture and performance evaluation , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[13]  Marcus Liwicki,et al.  Gradient-domain degradations for improving historical documents images layout analysis , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[14]  Denis Gracanin,et al.  A comparison of SOAP and REST implementations of a service based interaction independence middleware framework , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).

[15]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[16]  Alejandro Héctor Toselli,et al.  ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.