A Large-Scale Rich Context Query and Recommendation Dataset in Online Knowledge-Sharing

Data plays a vital role in machine learning studies. In the research of recommendation, both user behaviors and side information are helpful to model users. So, large-scale real scenario datasets with abundant user behaviors will contribute a lot. However, it is not easy to get such datasets as most of them are only hold and protected by companies. In this paper, a new large-scale dataset collected from a knowledge-sharing platform is presented, which is composed of around 100M interactions collected within 10 days, 798K users, 165K questions, 554K answers, 240K authors, 70K topics, and more than 501K user query keywords. There are also descriptions of users, answers, questions, authors, and topics, which are anonymous. Note that each user’s latest query keywords have not been included in previous open datasets, which reveal users’ explicit information needs. We characterize the dataset and demonstrate its potential applications for recommendation study. Multiple experiments show the dataset can be used to evaluate algorithms in general top-N recommendation, sequential recommendation, and context-aware recommendation. This dataset can also be used to integrate search and recommendation and recommendation with negative feedback. Besides, tasks beyond recommendation, such as user gender prediction, most valuable answerer identification, and high-quality answer recognition, can also use this dataset. To the best of our knowledge, this is the largest real-world interaction dataset for personalized recommendation.

[1]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Fabian Abel,et al.  RecSys Challenge 2017: Offline and Online Evaluation , 2017, RecSys.

[3]  Yiqun Liu,et al.  Adaptive Feature Sampling for Recommendation with Missing Content Feature Values , 2019, CIKM.

[4]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[5]  Abraham Bernstein,et al.  Loss Aversion in Recommender Systems: Utilizing Negative User Preference to Improve Recommendation Quality , 2018, ArXiv.

[6]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Yiqun Liu,et al.  Attention-based Adaptive Model to Unify Warm and Cold Starts Recommendation , 2018, CIKM.

[8]  Yiqun Liu,et al.  Efficient Neural Matrix Factorization without Sampling for Recommendation , 2020, ACM Trans. Inf. Syst..

[9]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[10]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[11]  Zhaochun Ren,et al.  Neural Attentive Session-based Recommendation , 2017, CIKM.

[12]  Walid Krichene,et al.  On Sampled Metrics for Item Recommendation , 2020, KDD.

[13]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[14]  Ji-Rong Wen,et al.  RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms , 2020, CIKM.

[15]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[16]  Tat-Seng Chua,et al.  Neural Factorization Machines for Sparse Predictive Analytics , 2017, SIGIR.

[17]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[18]  Yongdong Zhang,et al.  LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation , 2020, SIGIR.

[19]  Lior Rokach,et al.  Survey on Collaborative Filtering, Content-based Filtering and Hybrid Recommendation System , 2015 .

[20]  Wang Ling,et al.  Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[21]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[22]  Yi Zhang,et al.  Contextual Recommendation based on Text Mining , 2010, COLING.

[23]  Erik Duval,et al.  Context-Aware Recommender Systems for Learning: A Survey and Future Challenges , 2012, IEEE Transactions on Learning Technologies.

[24]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[25]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[26]  Wei Wang,et al.  Recommender system application developments: A survey , 2015, Decis. Support Syst..

[27]  Lalita Sharma,et al.  A Survey of Recommendation System: Research Challenges , 2013 .

[28]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[29]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[30]  A KonstanJoseph,et al.  The MovieLens Datasets , 2015 .

[31]  Jon Atle Gulla,et al.  The Adressa dataset for news recommendation , 2017, WI.