Document Embeddings vs. Keyphrases vs. Terms for Recommender Systems: A Large-Scale Online Evaluation

Many recommendation algorithms are available to digital library recommender system operators. The effectiveness of algorithms is largely unreported by way of online evaluation. We compare a standard term-based recommendation approach to two promising approaches for related-article recommendation in digital libraries: document embeddings, and keyphrases. We evaluate the consistency of their performance across multiple scenarios. Through our recommender-system as-a-service Mr. DLib, we delivered 33.5M recommendations to users of Sowiport and Jabref over the course of 19 months, from March 2017 to October 2018. The effectiveness of the algorithms differs significantly between Sowiport and Jabref (Wilcoxon rank-sum test; p < 0.05). There is a ~400% difference in effectiveness between the best and worst algorithm in both scenarios separately. The best performing algorithm in Sowiport (terms) is the worst performing in Jabref. The best performing algorithm in Jabref (keyphrases) is 70% worse in Sowiport, than Sowiport's best algorithm (click-through rate; 0.1% terms, 0.03% keyphrases).

[1]  Philipp Mayr,et al.  Digital Library Research in Action: Supporting Information Retrieval in Sowiport , 2015, D Lib Mag..

[2]  Hannah Bast,et al.  A review of word embedding and document similarity algorithms applied to academic text , 2017 .

[3]  Aditi Sharan,et al.  Keyword and Keyphrase Extraction Techniques: A Literature Review , 2015 .

[4]  Bela Gipp,et al.  Research-paper recommender systems: a literature survey , 2015, International Journal on Digital Libraries.

[5]  Jöran Beel,et al.  Stereotype and Most-Popular Recommendations in the Digital Library Sowiport , 2017, ISI.

[6]  Zdenek Zdráhal,et al.  CORE: aggregation use cases for open access , 2013, JCDL '13.

[7]  Barry Smyth,et al.  RARD II: The 94 Million Related-Article Recommendation Dataset , 2018, AMIR@ECIR.

[8]  Carlo Tasso,et al.  A Keyphrase-Based Paper Recommender System , 2011, IRCDL.

[9]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[10]  Jöran Beel,et al.  A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems , 2015, TPDL.

[11]  Jöran Beel,et al.  Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[12]  Marco Basaldella,et al.  Introducing Distiller: A Unifying Framework for Knowledge Extraction , 2015, IT@LIA@AI*IA.

[13]  Adeleh Asemi,et al.  Meta-analysis of evaluation methods and metrics used in context-aware scholarly recommender systems , 2019, Knowledge and Information Systems.

[14]  Oliver Kopp,et al.  CloudRef - Towards Collaborative Reference Management in the Cloud , 2018, ZEUS.

[15]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[16]  Jöran Beel,et al.  Towards reproducibility in recommender-systems research , 2016, User Modeling and User-Adapted Interaction.

[17]  Amanda Stent,et al.  Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers) , 2018, North American Chapter of the Association for Computational Linguistics.

[18]  Christian Biemann,et al.  Document-based Recommender System for Job Postings using Dense Representations , 2018, NAACL.

[19]  Carlo Tasso,et al.  A Personalized Concept-driven Recommender System for Scientific Libraries , 2014, IRCDL.