Statistical learning for OCR error correction
暂无分享,去创建一个
Evangelos E. Milios | Aminul Islam | Jie Mei | Abidalrahman Mohammad | Yajing Wu | E. Milios | Aminul Islam | A. Mohammad | Jie Mei | Yajing Wu
[1] Kazem Taghva,et al. OCRSpell: an interactive spelling correction system for OCR errors in text , 2001, International Journal on Document Analysis and Recognition.
[2] Sophia Ananiadou,et al. Customised OCR correction for historical medical text , 2015, 2015 Digital Heritage.
[3] Rose Holley. Many Hands Make Light Work : Public Collaborative OCR Text Correction in Australian Historic Newspapers , 2009 .
[4] Murhaf Fares,et al. Machine Learning for High-Quality Tokenization Replicating Variable Tokenization Schemes , 2013, CICLing.
[5] Justin Tonra,et al. Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham , 2012, Lit. Linguistic Comput..
[6] M. Worboys,et al. Text Mining the History of Medicine , 2016, PloS one.
[7] Bryan Jurish,et al. Word and Sentence Tokenization with Hidden Markov Models , 2013, J. Lang. Technol. Comput. Linguistics.
[8] Günter Mühlberger,et al. User-driven correction of OCR errors: combining crowdsourcing and information retrieval technology , 2014, DATeCH '14.
[9] Diana Inkpen,et al. Real-word spelling correction using Google Web 1T n-gram with backoff , 2009, 2009 International Conference on Natural Language Processing and Knowledge Engineering.
[10] Xu Sun,et al. A Large Scale Ranker-Based System for Search Query Spelling Correction , 2010, COLING.
[11] Nachum Dershowitz,et al. OCR Error Correction Using Character Correction and Feature-Based Word Classification , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).
[12] Martin Reynaert. Character confusion versus focus word-based correction of spelling and OCR variants in corpora , 2010, International Journal on Document Analysis and Recognition (IJDAR).
[13] Marcus Liwicki,et al. Character-Level Alignment Using WFST and LSTM for Post-processing in Multi-script Recognition Systems - A Comparative Study , 2014, ICIAR.
[14] Klaus U. Schulz,et al. PoCoTo - an open source system for efficient interactive postcorrection of OCRed historical texts , 2014, DATeCH '14.
[15] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[16] Iyad Abu Doush,et al. Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction , 2016, Int. J. Reason. based Intell. Syst..
[17] Beatrice Alex,et al. Estimating and rating the quality of optically character recognised text , 2014, DATeCH '14.
[18] Johan Bos,et al. Elephant: Sequence Labeling for Word and Sentence Segmentation , 2013, EMNLP.
[19] Beatrice Alex,et al. Digitised historical text: Does it have to be mediOCRe? , 2012, KONVENS.
[20] Martin Reynaert. On OCR ground truths and OCR post-correction gold standards, tools and formats , 2014, DATeCH '14.
[21] Grzegorz Kondrak,et al. N-Gram Similarity and Distance , 2005, SPIRE.
[22] Karen Kukich,et al. Techniques for automatically correcting words in text , 1992, CSUR.
[23] Kazem Taghva,et al. Fuzzy Information Extraction on OCR Text , 2011, 2011 Eighth International Conference on Information Technology: New Generations.
[24] Yunyao Li,et al. A Graph Approach to Spelling Correction in Domain-Centric Search , 2011, ACL.
[25] Eric K. Ringger,et al. Progressive Alignment and Discriminative Error Correction for Multiple OCR Engines , 2011, 2011 International Conference on Document Analysis and Recognition.
[26] Leonid Boytsov,et al. Indexing methods for approximate dictionary searching: Comparative analysis , 2011, JEAL.
[27] Antony J. Williams,et al. Beautiful Data: The Stories Behind Elegant Data Solutions , 2009 .
[28] Eric K. Ringger,et al. Combining multiple thresholding binarization values to improve OCR output , 2013, Electronic Imaging.
[29] Diana Inkpen,et al. Correcting Different Types of Errors in Texts , 2011, Canadian Conference on AI.
[30] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[31] Eric K. Ringger,et al. How well does multiple OCR error correction generalize? , 2013, Electronic Imaging.
[32] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[33] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.
[34] William W. Cohen,et al. A Comparison of String Metrics for Matching Names and Records , 2003 .
[35] Daniel P. Lopresti. Optical character recognition errors and their effects on natural language processing , 2009, International Journal on Document Analysis and Recognition (IJDAR).
[36] Ray Smith. An Overview of the Tesseract OCR Engine , 2007 .
[37] Youssef Bassil,et al. Context-sensitive Spelling Correction Using Google Web 1T 5-Gram Information , 2012, Comput. Inf. Sci..
[38] Gareth J. F. Jones,et al. Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents , 2006, Inf. Process. Manag..
[39] Eric K. Ringger,et al. Improving optical character recognition through efficient multiple system alignment , 2009, JCDL '09.