论文信息 - Generating Search Term Variants for Text Collections with Historic Spellings

Generating Search Term Variants for Text Collections with Historic Spellings

In this paper, we describe a new approach for retrieval in texts with non-standard spelling, which is important for historic texts in English or German. For this purpose, we present a new algorithm for generating search term variants in ancient orthography. By applying a spell checker on a corpus of historic texts, we generate a list of candidate terms for which the contemporary spellings have to be assigned manually. Then our algorithm produces a set of probabilistic rules. These probabilities can be considered for ranking in the retrieval stage. An experimental comparison shows that our approach outperforms competing methods.

Norbert Fuhr | Andrea Ernst-Gerlach

[1] Uwe Quasthoff. Projekt Der Deutsche Wortschatz , 1997, GLDV-Jahrestagung.

[2] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[3] Ricardo Baeza-Yates,et al. Information Retrieval: Data Structures and Algorithms , 1992 .

[4] Carol Peters,et al. Cross-Language Information Retrieval and Evaluation , 2001, Lecture Notes in Computer Science.

[5] Yoram Singer,et al. Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[6] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7] Justin Zobel,et al. Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[8] Henrik Nottelmann,et al. PIRE: An Extensible IR Engine Based on Probabilistic Datalog , 2005, ECIR.

[9] Jadzia Cendrowska,et al. PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[10] Mercedes Arroyo Huguet,et al. LE MONDE, Paris , 1997 .

[11] Dawn Archer,et al. VARD versus WORD: A comparison of the UCREL variant detector and modern spellcheckers on English historical corpora , 2005 .

[12] R. E. Keller. Die deutsche Sprache und ihre historische Entwicklung , 1986 .

[13] Norbert Fuhr,et al. Retrieval Effectiveness of Proper Name Search Methods , 1996, Inf. Process. Manag..

[14] Rafael Camps,et al. Improving the Efficacy of Approximate Searching by Personal-Name , 2003, NLDB.

[15] Daniel Biella,et al. Edition électronique de la réception de Nietzsche des années 1865 à 1945 , 2003 .