Distributed Optical Character Recognition for Old Romanian Prints

Wide spreading and enabling machine processing of old prints relevant to a particular cultural area can be achieved using modern research infrastructures. This paper outlines the architecture of a distributed environment for optical character recognition customized for a collection comprising scanned books in old Romanian language and it also presents the preliminary results of our experiments.

[1]  Raymond Smith,et al.  Adapting the Tesseract open source OCR engine for multilingual OCR , 2009, MOCR '09.

[2]  Mirna Willer UNIMARC manual : authorities format , 2009 .

[3]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).