今日推荐

1999 - TREC

The TREC-8 Question Answering Track Report

The TREC-8 Question Answering track was the first large-scale evaluation of domain-independent question answering systems. This paper summarizes the results of the track by giving a brief overview of the different approaches taken to solve the problem. The most accurate systems found a correct response for more than 2/3 of the questions. Relatively simple bag-of-words approaches were adequate for finding answers when responses could be as long as a paragraph (250 bytes), but more sophisticated processing was necessary for more direct responses (50 bytes). The TREC-8 Question Answering track was an initial e ort to bring the bene ts of large-scale evaluation to bear on a question answering (QA) task. The goal in the QA task is to retrieve small snippets of text that contain the actual answer to a question rather than the document lists traditionally returned by text retrieval systems. The assumption is that users would usually prefer to be given the answer rather than and the answer themselves in a document. This paper summarizes the retrieval results of the track; a companion paper (\The TREC-8 Question Answering Track Evaluation") gives details about how the evaluation was implemented. By necessity, a track report can give only an overview of the different approaches used in the track. Readers are urged to consult the participants' papers elsewhere in the Proceedings for details regarding a particular approach.

2005

Overview of the TREC 2004 Novelty Track.

The TREC 2003 question answering track contained two tasks, the passages task and the main task. In the passages task, systems returned a single text snippet in response to factoid questions; the evaluation metric was the number of snippets that contained a correct answer. The main task contained three separate types of questions, factoid questions, list questions, and definition questions. Each of the questions was tagged as to its type and the different question types were evaluated separately. The final score for a main task run was a combination of the scores for the separate question types. This paper defines the various tasks included in the track and reports the evaluation results. Since the TREC 2003 track was the first time for significant participation in the definition and list subtasks, the paper also examines the reliability of the evaluation for these tasks. TREC introduced the first question answering (QA) track in TREC-8 (1999). The goal of the track is to foster research on systems that retrieve answers rather than documents in response to a question, with particular emphasis on systems that can function in unrestricted domains. The tasks in the track have evolved over the years to focus research on particular aspects of the problem deemed important to improving the state-of-the-art. The task in the original QA tracks required systems to return text snippets drawn from a large corpus of newspaper articles in response to closed-class or factoid questions such as Who invented the paper clip?. Each response was judged by a human assessor; a response was marked correct if an answer to the question was contained within the snippet. Unfortunately, the relative effectiveness of different systems was masked by the fact that two different snippets could both contain a correct answer while one was a significantly better response then the other [4]. To force systems to demonstrate their ability to locate the actual answer, the TREC 2002 task required systems to return exact answers, text strings consisting of a complete answer and nothing else. Strings that contained a right answer with additional text were judged to be “inexact” and did not contribute to a system’s score. Pinpointing the precise extent of an answer is a more difficult problem than finding a text segment that contains an answer, and there are applications of QA technology that do not require this extra step. To provide a forum for research groups interested in these applications, the TREC 2003 track included a “passages” task that allowed text segments containing answers to be returned. The other task in the track, the main task, required exact responses. While the test set of questions for the passages task contained only factoid questions, the main task contained list and definition questions as well as factoid questions. Each question type was evaluated separately, and the final score for a main task run was a combination of the scores for the three questions types. This paper provides an overview of the results of the TREC 2003 QA track. The first two sections describe the two tasks in the track and present the evaluation results for the tasks. Since the TREC 2003 track was the first time for significant participation in the definition and list subtasks, Section 4 examines the reliability of the evaluation used for these tasks. This analysis demonstrates that the evaluation results for the definition task must be interpreted with care as using different assessors can cause substantial changes in the relative evaluation scores. Using more questions in the definition test set should increase the stability of the evaluation.

论文关键词

time series software development information retrieval regression model image retrieval maximum likelihood knowledge base retrieval system model checking distance learning real-time system question answering extreme learning machine learning machine information retrieval system extreme learning order statistic content-based image retrieval temporal logic rate control formal method statistical inference weibull distribution nuclear reactor visual attention image retrieval system question answering system carnegie mellon university binary decision diagram java virtual machine answering system atrial fibrillation carnegie mellon memory network random sequence mellon university extreme programming southeast asia research issue model checker extreme event belief revision visual question answering bounded model checking symbolic model visual question abstract model extreme value theory bounded model symbolic model checking automated storage statistically significant bibliography index arithmetic logic unit model checking technique extreme value distribution model checking algorithm extreme weather south pacific interactive information retrieval sample variance multivariate extreme open-domain question answering model checking based state of knowledge extreme temperature answering question question answering dataset extreme rainfall open-domain question question answering track extreme precipitation daily temperature logic model checking answering track symbolic model checker desired property counterexample-guided abstraction refinement sat-based model checking temperature extreme extreme precipitation event climate extreme formal methods community extreme storm climate event sat-based model precipitation extreme french polynesia image question answering lazy abstraction severe thunderstorm modeling of extreme silo (dataset) pipeline (computing) word list by frequency reactor device component reactor (software) united state