论文信息 - Distributed Optical Character Recognition for Old Romanian Prints

Distributed Optical Character Recognition for Old Romanian Prints

Wide spreading and enabling machine processing of old prints relevant to a particular cultural area can be achieved using modern research infrastructures. This paper outlines the architecture of a distributed environment for optical character recognition customized for a collection comprising scanned books in old Romanian language and it also presents the preliminary results of our experiments.

Dana Petcu | Daniel Pop | Bogdan Irimie

[1] Raymond Smith,et al. Adapting the Tesseract open source OCR engine for multilingual OCR , 2009, MOCR '09.

[2] Mirna Willer. UNIMARC manual : authorities format , 2009 .

[3] R. Smith,et al. An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).