According to the results of TREC 2002, we realized the major challenge issue of recognizing relevant sentences is a lack of information used in similarity computation among sentences. In TREC 2003, NTU attempts to find relevant and novel information based on variants of employing information retrieval (IR) system. We call this methodology IR with reference corpus, which can also be considered an information expansion of sentences. A sentence is considered as a query of a reference corpus, and similarity between sentences is measured in terms of the weighting vectors of document lists ranked by IR systems. Basically, we looked for relevant sentences by comparing their results on a certain information retrieval system. Two sentences are regarded as similar if they are related to the similar document lists returned by IR system. In novelty parts, similar analysis is used to compare each relevant sentence with all those that preceded it to find out novelty. An effectively dynamic threshold setting which is based on what percentage of relevant sentences is within a relevant document is presented. In this paper, we paid attention to three points: first, how to use the results of IR system to compare the similarity between sentences; second, how to filter out the redundant sentences; third, how to determine appropriate relevance and novelty threshold.
[1]
Stephen E. Robertson,et al.
Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive
,
1998,
TREC.
[2]
James Allan,et al.
Retrieval and novelty detection at the sentence level
,
2003,
SIGIR.
[3]
Hsin-Hsi Chen,et al.
Some Similarity Computation Methods in Novelty Detection
,
2002,
TREC.
[4]
Steve Renals,et al.
Proceedings of the Ninth Text REtrieval Conference
,
2001
.
[5]
Donna K. Harman,et al.
Overview of the TREC 2002 Novelty Track
,
2002,
TREC.
[6]
Hsin-Hsi Chen,et al.
An NLP & IR approach to topic detection
,
2002
.
[7]
James Allan,et al.
Topic detection and tracking: event-based information organization
,
2002
.
[8]
Hsin-Hsi Chen,et al.
Identification of Relevant and Novel Sentences Using Reference Corpus
,
2004,
ECIR.
[9]
James Allan,et al.
UMass at TREC 2002: Cross Language and Novelty Tracks
,
2002,
TREC.
[10]
Hsin-Hsi Chen,et al.
A summarization system for Chinese news from multiple sources
,
2003,
J. Assoc. Inf. Sci. Technol..