Does Online Evaluation Correspond to Offline Evaluation in Query Auto Completion?

Query Auto Completion is the task of suggesting queries to the users of a search engine while they are typing a query in the search box. Over the recent years there has been a renewed interest in research on improving the quality of this task. The published improvements were assessed by using offline evaluation techniques and metrics. In this paper, we provide a comparison of online and offline assessments for Query Auto Completion. We show that there is a large potential for significant bias if the raw data used in an online experiment is re-used for offline experiments afterwards to evaluate new methods.