Top-k Query Based on Statistical Information Extraction Model

The top-k ranking is based on some scoring function in deterministic applications. However, in uncertain applications, such a clean definition does not exist, since the process of reporting a tuple in a top-k answer does not depend only on its score but also on its membership probability. This work introduces an approach to processing top-k queries based on statistical information extraction model, which enables us to determine the probability that an observed extraction is correct. We validate the performance of the model empirically on the task of extracting information from two data sets.