论文信息 - Using Information Gain to Filter Information in CLEF CL-SR Track

Using Information Gain to Filter Information in CLEF CL-SR Track

This paper describes the first participation of the SINAI team in the CLEF 2007 CL-SR track. The SINAI team has only participated in the English task. The English collection includes segments of audio speech recognition and topics to evaluate the information retrieval systems. This collection contains interviews with survivors of the Holocaust manually segmented. Moreover, each segment includes different fields with extra information. The topics to evaluate the English task are available in Czech, English, French, German, Dutch and Spanish. This year, the team only wants to establish a first contact with the task and the collection. Thus, the collection has been pre-processed using the Information Gain technique in order to filter the fields with most relevant information. The Lemur toolkit has been the Information Retrieval system used in the experiments.

[1] Miguel Ángel García Cumbreras,et al. BRUJA System. The University of Jaén at the Spanish Task of CLEFQA 2006 , 2006, CLEF.

[2] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3] Ryen W. White,et al. Overview of the CLEF-2006 Cross-Language Speech Retrieval Track , 2006, CLEF.

[4] Diana Inkpen,et al. Experiments for the Cross Language Speech Retrieval Task at CLEF 2006 , 2006, CLEF.

[5] Fredric C. Gey,et al. ENSM-SE at CLEF 2006 : Fuzzy Proximity Method with an Adhoc Influence Function in Evaluation of Multilingual and Multi-modal Information Retrieval 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain , 2007 .

[6] L. A. Ureña-López,et al. SINAI at ImageCLEF 2007 , 2007 .

[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[8] Diana Inkpen,et al. University of Ottawa's Participation in the CL-SR Task at CLEF 2006 , 2006, CLEF.

[9] Diana Inkpen,et al. Model Fusion Experiments for the Cross Language Speech Retrieval Task at CLEF 2007 , 2007, CLEF.

[10] Dong Xiang,et al. Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[11] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[12] Thomas M. Cover,et al. Elements of Information Theory , 2005 .