Using Word Embeddings for Recommending Datasets based on Scientific Publications

In scholarly search systems, computing recommendations of the same type, for example, additional publications when reading a particular publication, is a well-approached problem. However, suggesting items from another type, e.g., research data when reading a publication, is rarely covered in scholarly recommendations. In this position paper, we employ word embeddings to approach the problem of such cross-domain recommendations in scientific search systems, more specifically, recommending research data based on publications. Besides various metadata, publication and research dataset entries comprise textual metadata (e.g. title, abstract), which allows to detect similar entries using word embeddings. We illustrate first results, major problems and possible solutions when using word embeddings for recommending datasets based on publications.

[1]  Chang Zhou,et al.  Scalable Graph Embedding for Asymmetric Proximity , 2017, AAAI.

[2]  Bela Gipp,et al.  Research-paper recommender systems: a literature survey , 2015, International Journal on Digital Libraries.

[3]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[4]  Julian Szymanski,et al.  RDF dataset profiling - a survey of features, methods, vocabularies and applications , 2018, Semantic Web.

[5]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  Pasquale Lops,et al.  Word Embedding Techniques for Content-based Recommender Systems: An Empirical Evaluation , 2015, RecSys Posters.

[8]  Ian H. Witten,et al.  How to Build a Digital Library , 2002 .

[9]  Johann Schaible,et al.  STELLA: Towards a Framework for the Reproducibility of Online Search Experiments , 2019, OSIRRC@SIGIR.

[10]  Bernardo Pereira Nunes,et al.  Two Approaches to the Dataset Interlinking Recommendation Problem , 2014, WISE.

[11]  Paul T. Groth,et al.  Searching Data: A Review of Observational Data Retrieval Practices in Selected Disciplines , 2017, J. Assoc. Inf. Sci. Technol..

[12]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[13]  Zohra Bellahsene,et al.  Dataset Recommendation for Data Linking: An Intensional Approach , 2016, ESWC.

[14]  Ryen W. White Interactions with Search Systems , 2016 .

[15]  Sören Auer,et al.  Dataset Retrieval , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[16]  Katarina Boland,et al.  A Digital Library for Research Data and Related Information in the Social Sciences , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[17]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..