论文信息 - Linguistic features to predict query difficulty

Linguistic features to predict query difficulty

Query difficulty can be linked to a number of causes. Some of these causes can be related to the query expression itself, and can therefore be detected through a linguistic analysis of the query text. Using 16 different linguistic features, automatically computed on TREC queries, we looked for significant correlations between these features and the average recall and precision scores obtained by systems. Three of these features are shown to have a significant impact on either recall or precision scores for previous adhoc TREC campaigns. Each of these features can be viewed as a clue to a linguistically-specific characteristic, either morphological, syntactical or semantic. These results also open the way for a more enlightened use of linguistic processing in IR systems.

Josiane Mothe | Ludovic Tanguy | J. Mothe | Ludovic Tanguy

[1] Julia Galliers,et al. Evaluating natural language processing systems , 1995 .

[2] Jussi Karlgren,et al. Stylistic Experiments for Information Retrieval , 1999 .

[3] Didier Bourigault,et al. Linguistic clues for corpus-based acquisition of lexical dependencies , 2001 .

[4] Margaret King,et al. Evaluating natural language processing systems , 1996, CACM.

[5] C. Buckley,et al. Reliable Information Access Final Workshop Report , 2004 .

[6] Thomas Mandl,et al. Linguistic and Statistical Analysis of the CLEF Topics , 2002, CLEF.

[7] W. Bruce Croft,et al. Predicting query performance , 2002, SIGIR '02.

[8] Douglas Biber,et al. Variation across speech and writing: Methodology , 1988 .