Building a Wikipedia N-GRAM Corpus

[1]  Kazem Taghva,et al.  OCRSpell: an interactive spelling correction system for OCR errors in text , 2001, International Journal on Document Analysis and Recognition.

[2]  Stefan Evert,et al.  Google Web 1T 5-Grams Made Easy (but not for the computer) , 2010, WAC@NAACL-HLT.

[3]  Kazem Taghva,et al.  Reproducible Research in Document Analysis and Recognition , 2018 .

[4]  S. C. Rambaud,et al.  A measure of inconsistencies in intertemporal choice. , 2019 .

[5]  Cyril Nicaud,et al.  Merge Strategies: from Merge Sort to TimSort , 2015 .

[6]  Phil Bagwell,et al.  Ideal Hash Trees , 2001 .

[7]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8]  Fonseca Cacho,et al.  Improving OCR Post Processing with Machine Learning Tools , 2019 .

[9]  Kazem Taghva,et al.  The State of Reproducible Research in Computer Science , 2020 .

[10]  Jesús Peral,et al.  MergedTrie: Efficient textual indexing , 2019, PloS one.

[11]  Hugh E. Williams,et al.  Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[12]  Peter Brass Advanced Data Structures , 2008 .

[13]  Diana Inkpen,et al.  Real-Word Spelling Correction using Google Web 1T 3-grams , 2009, EMNLP.

[14]  Rene De La Briandais File searching using variable length keys , 1959, IRE-AIEE-ACM Computer Conference.

[15]  Kazem Taghva,et al.  Using the Google Web 1T 5-Gram Corpus for OCR Error Correction , 2019, 16th International Conference on Information Technology-New Generations (ITNG 2019).

[16]  Kazem Taghva,et al.  Recognizing acronyms and their definitions , 1999, International Journal on Document Analysis and Recognition.

[17]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[18]  Justin Zobel,et al.  Redesigning the string hash table, burst trie, and BST to exploit cache , 2011, JEAL.