RARD: The Related-Article Recommendation Dataset

Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains information about 57.4 million recommendations that were displayed to the users of Sowiport. Information includes details on which recommendation approaches were used (e.g. content-based filtering, stereotype, most popular), what types of features were used in content based filtering (simple terms vs. keyphrases), where the features were extracted from (title or abstract), and the time when recommendations were delivered and clicked. In addition, the dataset contains an implicit item-item rating matrix that was created based on the recommendation click logs. RARD enables researchers to train machine learning algorithms for research-paper recommendations, perform offline evaluations, and do research on data from Mr. DLib's recommender system, without implementing a recommender system themselves. In the field of scientific recommender systems, our dataset is unique. To the best of our knowledge, there is no dataset with more (implicit) ratings available, and that many variations of recommendation algorithms. The dataset is available at this http URL, and published under the Creative Commons Attribution 3.0 Unported (CC-BY) license.

[1]  Ingo Frommholz,et al.  Cluster-based polyrepresentation as science modelling approach for information retrieval , 2014, Scientometrics.

[2]  Jöran Beel,et al.  Real-World Recommender Systems for Academia: The Pain and Gain in Building, Operating, and Researching them , 2017, BIR@ECIR.

[3]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.

[4]  Philipp Mayr,et al.  Digital Library Research in Action: Supporting Information Retrieval in Sowiport , 2015, D Lib Mag..

[5]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[6]  Tanmoy Chakraborty,et al.  DiSCern: A diversified citation recommendation system for scientific queries , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[7]  Petr Knoth,et al.  Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining: Deliberate Lack of Interoperability or Not? , 2016 .

[8]  Felix Beierle,et al.  Exploring Choice Overload in Related-Article Recommendations in Digital Libraries , 2017, BIR@ECIR.

[9]  Cornelia Caragea,et al.  Can't see the forest for the trees?: a citation recommendation system , 2013, JCDL '13.

[10]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[11]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[12]  Iadh Ounis,et al.  NTCIR-10 Math Pilot Task Overview , 2013, NTCIR.

[13]  Michael Granitzer,et al.  Mendeley's open data for science and learning: a reply to the DataTEL challenge , 2012 .

[14]  Krisztian Balog,et al.  Overview of the TREC 2016 Open Search track Academic Search Edition , 2016 .

[15]  Mohsen Kahani,et al.  SemCiR: A citation recommendation system based on a novel semantic distance measure , 2013, Program.

[16]  Jöran Beel,et al.  Integration of the Scientific Recommender System Mr. DLib into the Reference Manager JabRef , 2017, ECIR.

[17]  Petr Knoth,et al.  Developing Infrastructure to Support Closer Collaboration of Aggregators with Open Repositories , 2016 .

[18]  Zdenek Zdráhal,et al.  CORE: Three Access Levels to Underpin Open Access , 2012, D Lib Mag..

[19]  Vincent P. Wade,et al.  Towards Cross Site Personalisation , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[20]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[21]  Christina Lioma,et al.  A Cascaded Classification Approach to Semantic Head Recognition , 2011, EMNLP.

[22]  Norman Meuschke,et al.  CITREC : An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central , 2015 .

[23]  Jöran Beel,et al.  Stereotype and Most-Popular Recommendations in the Digital Library Sowiport , 2017, ISI.

[24]  Martha Larson,et al.  Benchmarking News Recommendations: The CLEF NewsREEL Use Case , 2016, SIGF.

[25]  Howard D. White,et al.  Relevance theory and distributions of judgments in document retrieval , 2017, Inf. Process. Manag..

[26]  Jöran Beel,et al.  Towards effective research-paper recommender systems and user modeling based on mind maps , 2017, ArXiv.

[27]  Prasenjit Mitra,et al.  Utilizing Context in Generative Bayesian Models for Linked Corpus , 2010, AAAI.

[28]  Vivien Petras,et al.  A Framework for the Evaluation of Automatic Metadata Enrichments , 2014, MTSR.

[29]  Iadh Ounis,et al.  NTCIR-12 MathIR Task Overview , 2016, NTCIR.

[30]  Jöran Beel,et al.  The Architecture and Datasets of Docear's Research Paper Recommender System , 2014, D Lib Mag..

[31]  Matthias Hagen,et al.  Supporting Scholarly Search with Keyqueries , 2016, ECIR.

[32]  Wenyi Huang,et al.  Recommending citations: translating papers into references , 2012, CIKM.

[33]  Gesis,et al.  Sowiport User Search Sessions Data Set (SUSS) , 2016 .

[34]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[35]  Petr Knoth,et al.  Towards effective research recommender systems for repositories , 2017, ArXiv.

[36]  Min-Yen Kan,et al.  Scholarly paper recommendation via user's recent research interests , 2010, JCDL '10.

[37]  Fan Wang,et al.  A Comprehensive Survey of the Reviewer Assignment Problem , 2010, Int. J. Inf. Technol. Decis. Mak..

[38]  Jöran Beel,et al.  Exploring the Potential of User Modeling Based on Mind Maps , 2015, UMAP.

[39]  M. Fontoura,et al.  Analyzing the performance of top-k retrieval algorithms , .

[40]  Sean M. McNee,et al.  Enhancing digital libraries with TechLens+ , 2004, JCDL.

[41]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[42]  Felix Beierle,et al.  Analyzing social relations for recommending academic conferences , 2016, HotPOST '16.

[43]  Carlo Tasso,et al.  A Keyphrase-Based Paper Recommender System , 2011, IRCDL.

[44]  Iris Hendrickx,et al.  Overview of the CLEF 2016 Social Book Search Lab , 2016, CLEF.

[45]  Sophie Siebert,et al.  Extending a Research-Paper Recommendation System with Bibliometric Measures , 2017, BIR@ECIR.

[46]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.