The TREC 2005 Question Answering track contained three tasks: the main question answering task, the document ranking task, and the relationship task. The main task was the same as the single TREC 2004 QA task. In the main task, question series were used to define a set of targets. Eac h series was about a single target and contained factoid and list questions. The final question in the series was an “Ot her” question that asked for additional information about the target that was not covered by previous questions in the series. The document ranking task was to return a ranked list of documents for each question from a subset of the questions in the main task, where the documents were thought to contain an answer to the question. In the relationship tas k, systems were given TREC-like topic statements that ended with a question asking for evidence for a particular relationship. The goal of the TREC question answering (QA) track is to foster research on systems that return answers themselves, rather than documents containing answers, in response to a question. The track started in TREC-8 (1999), with the first several editions of the track focused on factoid questions. A factoid question is a fact-based, short answer question such as How many calories are there in a Big Mac? . The task in the TREC 2003 QA track was a combined task that contained list and definition questions in additio n to factoid questions [1]. A list question asks for differen t instances of a particular kind of information to be returned , such as List the names of chewing gums . Answering such questions requires a system to assemble an answer from information located in multiple documents. A definition question asks for interesting information about a particul ar person or thing such as Who is Vlad the Impaler?or What is a golden parachute?. Definition questions also require systems to locate inform ation in multiple documents, but in this case the information of interest is much less crisply de lineated. The TREC 2004 test set contained factoid and list questions grouped into different series, where each series had the target of a definition associated with it [2]. Each questi on in a series asked for some information about the target. In addition, the final question in each series was an explicit “Other” question, which was to be interpreted as “Tell me other interesting things about this target I don’t know enou gh to ask directly”. This last question is roughly equivalen t to the definition questions in the TREC 2003 task. Several concerns regarding the TREC 2005 QA track were raised during the TREC 2004 QA breakout session. Since the TREC 2004 task was rather different from previous years’ tasks, there was the desire to repeat the task largely unchanged. There was also the desire to build infrastructure that would allow a closer examination of the role document retrieval techniques play in supporting QA technology. As a result of this discussion, the main task for the 2005 QA track was decided to be essentially the same as the 2004 task in that the test set would consist of a set of questio n series where each series asks for information regarding a particular target. As in TREC 2004, the targets included people, organizations, and other entities (things); unlike TREC 2004 the target could also be an event. Events were added since the document set from which the answers are to be drawn are newswire articles. The runs were evaluated using the same methodology as in TREC 2004, except that the primary measure was the per-series score instead of the combined component score. The document ranking task was added to the TREC 2005 track to address the concern regarding document retrieval and QA. The task was to submit, for a subset of 50 of the questions in the main task, a ranked list of up to 1000 documents for each question. Groups whose primary emphasis was document retrieval rather than QA, were allowed to participate in the document ranking task without submitting actual answers for the main task. However, all TREC 2005 submissions to the main task were required to include a ranked list of documents for each question in the document
[1]
Jimmy J. Lin.
Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series
,
2007,
HLT-NAACL.
[2]
Hoa Trang Dang,et al.
Overview of the TREC 2006 Question Answering Track 99
,
2006,
TREC.
[3]
Jimmy J. Lin,et al.
Will Pyramids Built of Nuggets Topple Over?
,
2006,
NAACL.
[4]
Ellen M. Voorhees,et al.
Overview of the TREC 2004 Novelty Track.
,
2005
.
[5]
Tsuneaki Kato,et al.
Handling Information Access Dialogue through QA Technologies - A novel challenge for open-domain question answering
,
2004
.
[6]
James Allan,et al.
HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents
,
2003,
TREC.
[7]
Iadh Ounis,et al.
The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection
,
2006
.
[8]
Ellen M. Voorhees,et al.
Using Question Series to Evaluate Question Answering System Effectiveness
,
2005,
HLT.
[9]
Jimmy J. Lin,et al.
Different Structures for Evaluating Answers to Complex Questions: Pyramids Won’t Topple, and Neither Will Human Assessors
,
2007,
ACL.