An error driven approach to query segmentation

Query segmentation is the task of splitting a query into a sequence of non-overlapping segments that completely cover all tokens in the query. The majority of query segmentation methods are unsupervised. In this paper, we propose an error-driven approach to query segmentation (EDQS) with the help of search logs, which enables unsupervised training with guidance from the system-specific errors. In EDQS, we first detect the system's errors by examining the consistency among the segmentations of similar queries. Then, a model is trained by the detected errors to select the correct segmentation of a new query from the top-n outputs of the system. Our evaluation results show that EDQS can significantly boost the performance of state-of-the-art query segmentation methods on a publicly available data set.