暂无分享,去创建一个
[1] Rose Holley,et al. How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs , 2009, D Lib Mag..
[2] Kimmo Kettunen,et al. Keep, Change or Delete? Setting up a Low Resource OCR Post-correction Framework for a Digitized Old Finnish Newspaper Collection , 2015, IRCDL.
[3] Fachgebiet Wissensbasierte. Unsupervised Post-Correction of OCR Errors , 2010 .
[4] Irit Askira Gelman,et al. A "quick and dirty" website data quality indicator , 2008, WICOW '08.
[5] Hartmut Walravens. A NORDIC DIGITAL NEWSPAPER LIBRARY , 2006 .
[6] Kalervo Järvelin,et al. A Dictionary- and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages , 2010, CLEF.
[7] Kimmo Kettunen,et al. Information retrieval from historical newspaper collections in highly inflectional languages: A query expansion approach , 2016, J. Assoc. Inf. Sci. Technol..
[8] Hartmut Walravens. CONNECTING TO THE PAST – NEWSPAPER DIGITISATION IN THE NORDIC COUNTRIES , 2006 .
[9] Kimmo Kettunen,et al. Measuring Lexical Quality of a Historical Finnish Newspaper Collection ― Analysis of Garbled OCR Data with Basic Language Technology Tools and Means , 2016, LREC.
[10] Kent Fitch,et al. Correcting noisy OCR: context beats confusion , 2014, DATeCH '14.
[11] Edwin Klijn. The Current State-of-art in Newspaper Digitization: A Market Perspective , 2008, D Lib Mag..
[12] JärvelinAnni,et al. Information retrieval from historical newspaper collections in highly inflectional languages , 2016 .
[13] Martin Volk,et al. Reducing OCR Errors in Gothic-Script Documents , 2011, ERCIM News.
[14] Ulrich Reffle,et al. Unsupervised profiling of OCRed historical documents , 2013, Pattern Recognit..
[15] Klaus U. Schulz,et al. Orthographic Errors in Web Pages: Toward Cleaner Web Corpora , 2006, Computational Linguistics.
[16] Daniel McNamara,et al. Mining for the Meanings of a Murder: The Impact of OCR Quality on the Use of Digitized Historical Newspapers , 2014, Digit. Humanit. Q..
[17] Timo Honkela,et al. Analyzing and Improving the Quality of a Historical News Collection using Language Technology and Statistical Machine Learning Methods , 2014 .
[18] Ricardo Baeza-Yates,et al. On measuring the lexical quality of the web , 2012, WebQuality '12.
[19] Daniel P. Lopresti. Optical character recognition errors and their effects on natural language processing , 2009, International Journal on Document Analysis and Recognition (IJDAR).
[20] Jacques Savoy,et al. Comparative information retrieval evaluation for scanned documents , 2011 .
[21] Kimmo Kettunen,et al. Exporting Finnish Digitized Historical Newspaper Contents for Offline Use , 2016, D Lib Mag..
[22] Ellen M. Voorhees,et al. The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text , 2000, Information Retrieval.
[23] Lynda Hardman,et al. Impact Analysis of OCR Quality on Research Tasks in Digital Archives , 2015, TPDL.
[24] Peter Schäuble,et al. Information Retrieval can Cope with Many Errors , 2000, Information Retrieval.
[25] Esa Toom,et al. Kotimaisten kielten tutkimuskeskus , 2004 .
[26] Rico Sennrich,et al. Strategies for Reducing and Correcting OCR Errors , 2011, Language Technology for Cultural Heritage.
[27] Kazem Taghva,et al. Evaluation of model-based retrieval effectiveness with OCR text , 1996, TOIS.
[28] Geoffrey Sampson,et al. Word frequency distributions , 2002, Computational Linguistics.
[29] Martin Reynaert,et al. Non-interactive OCR Post-correction for Giga-Scale Digitization Projects , 2008, CICLing.
[30] Simon Tanner,et al. Measuring Mass Text Digitization Quality and Usefulness: Lessons Learned from Assessing the OCR Accuracy of the British Library's 19th Century Online Newspaper Archive , 2009, D Lib Mag..