论文信息 - Random walk term weighting for information retrieval

Random walk term weighting for information retrieval

We present a way of estimating term weights for Information Retrieval (IR), using term co-occurrence as a measure of dependency between terms.We use the random walk graph-based ranking algorithm on a graph that encodes terms and co-occurrence dependencies in text, from which we derive term weights that represent a quantification of how a term contributes to its context. Evaluation on two TREC collections and 350 topics shows that the random walk-based term weights perform at least comparably to the traditional tf-idf term weighting, while they outperform it when the distance between co-occurring terms is between 6 and 30 terms.

Christina Lioma | Roi Blanco | Roi Blanco | C. Lioma

[1] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[3] Rada Mihalcea,et al. Random-Walk Term Weighting for Improved Text Classification , 2006, International Conference on Semantic Computing (ICSC 2007).

[4] Dragomir R. Radev,et al. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..