The design of the AskMSR question answering system is motivated by recent observations in natural language processing that for many applications, significant improvements in accuracy can be attained simply by increasing the amount of data used for learning (e.g., Banko & Brill, 2001). By taking advantage of the vast amount of online text available via the worldwide web, rather than relying on an approach that depends heavily on natural language intensive techniques, we developed a simple but effective question answering system. Many groups working on question answering use a variety of linguistic resources – part-of-speech tagging, parsing, named entiry extraction, WordNet, etc. We chose instead to focus on the tremendous resource that the web provides simply as a gigantic data repository. The web, which is home to billions of pages of electronic text, is orders of magnitude larger than the TREC QA document collection, which consists of fewer than 1 million documents.
[1]
Jimmy J. Lin,et al.
Data-Intensive Question Answering
,
2001,
TREC.
[2]
Sabine Buchholz,et al.
Using Grammatical Relations, Answer Frequencies and the World Wide Web for TREC Question Answering
,
2001,
TREC.
[3]
Oren Etzioni,et al.
Scaling question answering to the Web
,
2001,
WWW '01.
[4]
Michele Banko,et al.
Scaling to Very Very Large Corpora for Natural Language Disambiguation
,
2001,
ACL.
[5]
Oren Etzioni,et al.
Scaling question answering to the Web
,
2001,
WWW '01.
[6]
Steve Renals,et al.
Proceedings of the Ninth Text REtrieval Conference
,
2001
.