Intention extraction and semantic matching for internet FAQ retrieval using spoken language query

An FAQ (frequently-asked question) pattern consists of a question and a text document that answers the question and contains some additional remarks. As a query is similar to the FAQ’s question, the FAQ’s answer gives a possible answer or parts of the answer of the query. On the other hand, an FAQ’s answer may also contain information not concerning with the corresponding FAQ’s question but embed the answer for other questions. For a given query, therefore, the answer can be obtained from both FAQ question and answer. In this paper, we propose a framework for Internet FAQ retrieval by using spoken language query. We aim at two points: (1) extraction of the main intention embedded in a query sentence and (2) semantic comparison between a query sentence and an FAQ pattern. To evaluate the system performance, a collection of 1022 FAQ patterns and a set of 185 query sentences are collected for experiment. In intention extraction, 91.9% of intention segments can be extracted correctly. Compared to the keyword-based approach, an improvement from 78.06% to 95.28% in recall rate for the top 10 candidates is obtained.