Rule-based Search in Text Databases with Nonstandard Orthography

In this article, we describe our interdisciplinary project 'Rule-based search in text databases with nonstandard orthography (RSNSR)' in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).