Applying Bayesian belief networks in approximate string matching for robust keyword-based retrieval

We present a novel approach towards robust keyword-based retrieval. Bayesian belief networks are applied in a word-model based approximate string matching algorithm. Apart from a proven reliable performance in a working implementation on standard sources like digital text, wholly probabilistic modeling allows for integration of confidence measures and hypotheses obtained from preprocessing stages, like handwriting recognition or optical character recognition, respecting uncertainties on the lower levels. Furthermore, a flexible method to include the modeling of specific error types derived from humans and various input sources is provided. The remarkable performance of the algorithms presented was tested during extensive evaluation with respect to the Levenstein distance, which can be seen as the basis of state-of-the-art methods in this research field. The tests ran on a 14 K database containing common international music titles and four 10 K databases consisting of the most frequently used words in English, German, French and Dutch.