Using Information Gain to Filter Information in CLEF CL-SR Track

This paper describes the first participation of the SINAI team in the CLEF 2007 CL-SR track. The SINAI team has only participated in the English task. The English collection includes segments of audio speech recognition and topics to evaluate the information retrieval systems. This collection contains interviews with survivors of the Holocaust manually segmented. Moreover, each segment includes different fields with extra information. The topics to evaluate the English task are available in Czech, English, French, German, Dutch and Spanish. This year, the team only wants to establish a first contact with the task and the collection. Thus, the collection has been pre-processed using the Information Gain technique in order to filter the fields with most relevant information. The Lemur toolkit has been the Information Retrieval system used in the experiments.