论文信息 - Effects of central tendency measures on term weighting in textual information retrieval

Effects of central tendency measures on term weighting in textual information retrieval

It has become evident that term weighting has a significant effect on relevant document retrieval for which various methods are proposed. However, the main question that arises is which weighting method is the best? In this paper, it is shown that proper aggregation of weights generated by carefully selected basic weighting methods improves retrieval of the relevant documents with respect to the user’s needs. Toward this aim, it is shown that even using simple central tendency measures such as average, median or mid-range over an appropriate subset of basic weighting methods provides term weight that not only outperforms using each basic weighting method but also results in more effective weights in comparison with recently proposed complicated weighting methods. Based on exploiting the proposed method on various datasets, we have studied the effects of normalization of the basic weights, normalization of the vector lengths, the use of different components in the term frequency factor, etc. Results reveal the criteria for selecting an appropriate subset of basic weighting methods that would be fed to the aggregator in order to achieve higher retrieval precision.

[1] Alper Kursat Uysal,et al. Improved inverse gravity moment term weighting for text classification , 2019, Expert Syst. Appl..

[2] Hans Friedrich Witschel. Global term weights in distributed environments , 2008, Inf. Process. Manag..

[3] Hang Li. Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[4] Miles Efron. Linear time series models for term weighting in information retrieval , 2010 .

[5] Fragkiskos D. Malliaros,et al. Graph-based term weighting for text categorization , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[6] Ricardo Baeza-Yates,et al. Information Retrieval: Data Structures and Algorithms , 1992 .

[7] Samba Ndiaye,et al. A Novel Term Weighting Scheme Model , 2018, ICFET '18.

[8] Falk Scholer,et al. User performance versus precision measures for simple search tasks , 2006, SIGIR.

[9] Massih-Reza Amini,et al. Exploring the space of information retrieval term scoring functions , 2017, Inf. Process. Manag..

[10] Ronan Cummins,et al. Evolving local and global weighting schemes in information retrieval , 2006, Information Retrieval.

[11] Gloria Bordogna,et al. Extending Boolean information retrieval: a fuzzy model based on linguistic variables , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.

[12] Ricardo Baeza-Yates,et al. Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[13] Gloria Bordogna,et al. Controlling retrieval through a user-adaptive representation of documents , 1995, Int. J. Approx. Reason..

[14] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[15] Jihong Ouyang,et al. Exploring coherent topics by topic modeling with term weighting , 2018, Inf. Process. Manag..

[16] Andrea Esuli,et al. Learning to Weight for Text Classification , 2019, IEEE Transactions on Knowledge and Data Engineering.

[17] Yuanhua Lv,et al. A Pólya Urn Document Language Model for Improved Information Retrieval , 2015, ACM Trans. Inf. Syst..