On OCR ground truths and OCR post-correction gold standards, tools and formats
暂无分享,去创建一个
[1] Martin Reynaert,et al. FoLiA: A practical XML Format for Linguistic Annotation - a descriptive and comparative study , 2014, CLIN 2014.
[2] Thomas M. Breuel. The hOCR Microformat for OCR Workflow and Results , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).
[3] Tomasz Parkoła,et al. Report on the comparison of Tesseract and ABBYY FineReader OCR engines , 2012 .
[4] W. Bruce Croft,et al. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .
[5] Martin Reynaert. Character confusion versus focus word-based correction of spelling and OCR variants in corpora , 2010, International Journal on Document Analysis and Recognition (IJDAR).
[6] Jesse de Does,et al. Lexicon-supported OCR of eighteenth century Dutch books: a case study , 2013, Electronic Imaging.
[7] Antske Fokkens,et al. Offspring from Reproduction Problems: What Replication Failure Teaches Us , 2013, ACL.
[8] Martin Reynaert,et al. All, and only, the Errors: more Complete and Consistent Spelling and OCR-Error Correction Evaluation , 2008, LREC.
[9] Maarten de Rijke,et al. Feeding the Second Screen: Semantic Linking based on Subtitles , 2013, DIR.
[10] R. Manmatha,et al. A Fast Alignment Scheme for Automatic OCR Evaluation of Books , 2011, 2011 International Conference on Document Analysis and Recognition.
[11] Iris Hendrickx,et al. Historical spelling normalization. A comparison of two statistical methods : TICCL and VARD2 , 2012 .
[12] K. Vis. Subjectivity in news discourse : A corpus linguistic analysis of informalization , 2011 .
[13] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[14] Beatrice Alex,et al. Digitised historical text: Does it have to be mediOCRe? , 2012, KONVENS.
[15] Martin Reynaert. Synergy of Nederlab and , 2014, LREC.