A Hybrid Approach to Natural Language Web Search

We describe a hybrid approach to improving search performance by providing a natural language front end to a traditional keyword-based search engine. The key component of the system is iterative query formulation and retrieval, in which one or more queries are automatically formulated from the user's question, issued to the search engine, and the results accumulated to form the hit list. New queries are generated by relaxing previously-issued queries using transformation rules, applied in an order obtained by reinforcement learning. This statistical component is augmented by a knowledge-driven hub-page identifier that retrieves a hub-page for the most salient noun phrase in the question, if possible. Evaluation on an unseen test set over the www.ibm.com public website with 1.3 million webpages shows that both components make substantial contribution to improving search performance, achieving a combined 137% relative improvement in the number of questions correctly answered, compared to a baseline of keyword queries consisting of two noun phrases.