Distributed Optical Character Recognition for Old Romanian Prints
暂无分享,去创建一个
Wide spreading and enabling machine processing of old prints relevant to a particular cultural area can be achieved using modern research infrastructures. This paper outlines the architecture of a distributed environment for optical character recognition customized for a collection comprising scanned books in old Romanian language and it also presents the preliminary results of our experiments.
[1] Raymond Smith,et al. Adapting the Tesseract open source OCR engine for multilingual OCR , 2009, MOCR '09.
[2] Mirna Willer. UNIMARC manual : authorities format , 2009 .
[3] R. Smith,et al. An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).