A Statistical Language Modelling Approach for Question Answering System

This paper concerns about Question Answering system in which a statistical language modeling approach is used. The main objective is to build a simple system for question answering without the need for highly tuned linguistic modules which need more human work and is very difficult to find any bugs if any. A mathematical model for answer retrieval and answer extraction is derived, which does not use any linguistic information or annotated data. It makes use of word tokens and web data. We take a statistical, noisy-channel approach and treat QA as a whole as a classification problem. We present a fully data-driven mathematical model for estimating the probability of a candidate answer given a question. In doing so we largely remove the need for ad-hoc weights and parameters that were a feature of many TREC systems.