Categorizing Queries by Topic Directory

The categorization of a Web user query by topic or category can be used to select useful Web sources that contain the required information. In pursuit of this goal, we explore methods for mapping user queries to category hierarchies under which deep Web resources are also assumed to be classified. Our sources for these category hierarchies, or directories, are Yahoo! Directory and Wikipedia. Forwarding an unrefined query (in our case a typical fact finding query sent to a question answering system) directly to these directory resources usually returns no directories or incorrect ones. Instead, we develop techniques to generate more specific directory finding queries from an unrefined query and use these to retrieve better directories. Despite these engineered queries, our two resources often return multiple directories that include many incorrect results, i.e., directories whose categories are not related to the query, and thus Web resources for these categories are unlikely to contain the required information. We develop methods for selecting the most useful ones. We consider a directory to be useful if Web sources for any of its narrow categories are likely to contain the searched for information. We evaluate our mapping system on a set of 250 TREC questions and obtain precision and recall in the 0.8 to 1.0 range.