Information Gain Ratio as Term Weight: The case of Summarization of IR Results

This paper proposes a new term weighting method for summarizing documents retrieved by IR system. Unlike query-biased summarization, our method utilizes not the information of query, but the similarity information among original documents by hierarchical clustering. To map the similarity structure of the clusters into the weight of each word, we adopt the information gain ratio of probabilistic distribution of each word as term weight.