Building a Wikipedia N-GRAM Corpus
暂无分享,去创建一个
[1] Kazem Taghva,et al. OCRSpell: an interactive spelling correction system for OCR errors in text , 2001, International Journal on Document Analysis and Recognition.
[2] Stefan Evert,et al. Google Web 1T 5-Grams Made Easy (but not for the computer) , 2010, WAC@NAACL-HLT.
[3] Kazem Taghva,et al. Reproducible Research in Document Analysis and Recognition , 2018 .
[4] S. C. Rambaud,et al. A measure of inconsistencies in intertemporal choice. , 2019 .
[5] Cyril Nicaud,et al. Merge Strategies: from Merge Sort to TimSort , 2015 .
[6] Phil Bagwell,et al. Ideal Hash Trees , 2001 .
[7] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[8] Fonseca Cacho,et al. Improving OCR Post Processing with Machine Learning Tools , 2019 .
[9] Kazem Taghva,et al. The State of Reproducible Research in Computer Science , 2020 .
[10] Jesús Peral,et al. MergedTrie: Efficient textual indexing , 2019, PloS one.
[11] Hugh E. Williams,et al. Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.
[12] Peter Brass. Advanced Data Structures , 2008 .
[13] Diana Inkpen,et al. Real-Word Spelling Correction using Google Web 1T 3-grams , 2009, EMNLP.
[14] Rene De La Briandais. File searching using variable length keys , 1959, IRE-AIEE-ACM Computer Conference.
[15] Kazem Taghva,et al. Using the Google Web 1T 5-Gram Corpus for OCR Error Correction , 2019, 16th International Conference on Information Technology-New Generations (ITNG 2019).
[16] Kazem Taghva,et al. Recognizing acronyms and their definitions , 1999, International Journal on Document Analysis and Recognition.
[17] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .
[18] Justin Zobel,et al. Redesigning the string hash table, burst trie, and BST to exploit cache , 2011, JEAL.