Analyses for elucidating current question answering technology

In this paper, we take a detailed look at the performance of components of an idealized question answering system on two different tasks: the TREC Question Answering task and a set of reading comprehension exams. We carry out three types of analysis: inherent properties of the data, feature analysis, and performance bounds. Based on these analyses we explain some of the performance results of the current generation of Q/A systems and make predictions on future work. In particular, we present four findings: (1) Q/A system performance is correlated with answer repetition; (2) relative overlap scores are more effective than absolute overlap scores; (3) equivalence classes on scoring functions can be used to quantify performance bounds; and (4) perfect answer typing still leaves a great deal of ambiguity for a Q/A system because sentences often contain several items of the same type.