Towards Answering How do I Questions Using Classification

Interest in developing open domain question answering systems that leverage the massive amount of knowledge available on the Web is on the rise. In this investigation, we address the problem of answering How do I questions. Our goal is to use the top results obtained from a search engine to extract and present correct answers. Identifying correct answers to such questions is a hard problem that seems to require deep natural language understanding. Fortunately, answers to How do I questions are often procedural, typically containing a successive sequence of actions. Learning to label text as procedural or non-procedural is an easier problem which we attempted to solve by extracting 12 informative features with which we trained classifiers. However, the corpus built from the top documents retrieved for a set of How do I- equivalent queries turned out to be highly imbalanced. To tackle this issue, sampling techniques were used for a variety of classification methods, yielding reasonable recall and precision for the minority class of procedural texts.