TF-IDuF : A Novel Term-Weighting Scheme for User Modeling based on Users’ Personal Document Collections

TF-IDF is one of the most popular term-weighting schemes, and is applied by search engines, recommender systems, and user modeling engines. With regard to user modeling and recommender systems, we see two shortcomings of TF-IDF. First, calculating IDF requires access to the document corpus from which recommendations are made. Such access is not always given in a user-modeling or recommender system. Second, TF-IDF ignores information from a user’s personal document collection, which could – so we hypothesize – enhance the user modeling process. In this paper, we introduce TFIDuF as a term-weighting scheme that does not require access to the general document corpus and that considers information from the users’ personal document collections. We evaluated the effectiveness of TF-IDuF compared to TF-IDF and TF-Only and found that TF-IDF and TF-IDuF perform similarly (clickthrough rates (CTR) of 5.09% vs. 5.14%), and both are around 25% more effective than TF-Only (CTR of 4.06%) for recommending research papers. Consequently, we conclude that TF-IDuF could be a promising term-weighting scheme, especially when access to the document corpus for recommendations is not possible, and thus classic IDF cannot be computed. It is also notable that TF-IDuF and TF-IDF are not exclusive, so that both metrics may be combined to a more effective term-weighting scheme.

[1]  Bela Gipp,et al.  Research-paper recommender systems: a literature survey , 2015, International Journal on Digital Libraries.

[2]  Jöran Beel,et al.  'SciPlore MindMapping' : A Tool for Creating Mind Maps Combined with PDF and Reference Management , 2009 .

[3]  Ee-Peng Lim,et al.  A Business Zone Recommender System Based on Facebook and Urban Planning Data , 2016, ECIR.

[4]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[7]  Jöran Beel,et al.  The Architecture and Datasets of Docear's Research Paper Recommender System , 2014, D Lib Mag..

[8]  Jöran Beel,et al.  A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems , 2015, TPDL.

[9]  Nicholas Jing Yuan,et al.  T-Finder: A Recommender System for Finding Passengers and Vacant Taxis , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Jöran Beel,et al.  Introducing Docear's research paper recommender system , 2013, JCDL '13.

[11]  Ning Wang,et al.  Towards a Recommender System from Semantic Traces for Decision Aid , 2014, KMIS.

[12]  Kilian Q. Weinberger,et al.  An alternative text representation to TF-IDF and Bag-of-Words , 2013, ArXiv.

[13]  Jöran Beel,et al.  Docear: an academic literature suite for searching, organizing and creating academic literature , 2011, JCDL '11.

[14]  Michalis Vazirgiannis,et al.  Graph-of-word and TW-IDF: new approach to ad hoc IR , 2013, CIKM.

[15]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[16]  Marcos André Gonçalves,et al.  A source independent framework for research paper recommendation , 2011, JCDL '11.

[17]  Jöran Beel,et al.  Towards reproducibility in recommender-systems research , 2016, User Modeling and User-Adapted Interaction.

[18]  Hakan Altinçay,et al.  Analytical evaluation of term weighting schemes for text categorization , 2010, Pattern Recognit. Lett..

[19]  Langer Docear,et al.  The Comparability of Recommender System Evaluations and Characteristics of Docear ’ s Users , 2014 .

[20]  Tao Li,et al.  MAPS: A Multi Aspect Personalized POI Recommender System , 2016, RecSys.

[21]  Eero Hyvönen,et al.  SMARTMUSEUM: A mobile recommender system for the Web of Data , 2013, J. Web Semant..

[22]  Gregoris Mentzas,et al.  A topic-based recommender system for electronic marketplace platforms , 2013, Expert Syst. Appl..

[23]  Jöran Beel,et al.  Towards effective research-paper recommender systems and user modeling based on mind maps , 2017, ArXiv.

[24]  Masoud Rahgozar,et al.  A query term re-weighting approach using document similarity , 2016, Inf. Process. Manag..

[25]  Mohsen Afsharchi,et al.  A semantic social network-based expert recommender system , 2013, Applied Intelligence.

[26]  Djoerd Hiemstra,et al.  A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[27]  Claudio Sartori,et al.  A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf , 2015, DATA.

[28]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[29]  P. B. Shola,et al.  Application of Content-Based Approach in Research Paper Recommendation System for a Digital Library , 2014 .

[30]  Mamadou Diaby,et al.  Toward the next generation of recruitment tools: An online social network-based job recommender system , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[31]  Jöran Beel,et al.  Exploring the Potential of User Modeling Based on Mind Maps , 2015, UMAP.