Keyword Spices: A New Method for Building Domain-Specific Web Search Engines

This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on human knowledge about the domain in question. Accordingly, they are hard to build and can not be applied to other domains. The keyword spice method, in contrast, improves search performance by adding domain-specific keywords, called keyword spices, to the user's input query; the modified query is then forwarded to a general-purpose search engine. Keyword spices can be effectively discovered automatically from web documents allowing us to build high quality domain-specific search engines in various domains without requiring the collection of heuristic knowledge. We describe a machine learning algorithm, which is a type of decision-tree learning algorithm, that can extract keyword spices. To demonstrate the value of the proposed approach, we conduct experiments in the domain of cooking. The results confirm the excellent performance of our method in terms of both precision and recall.