论文信息 - The 2021 RecSys Challenge Dataset: Fairness is not optional

The 2021 RecSys Challenge Dataset: Fairness is not optional

After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year’s dataset is not only bigger (~1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.

[1] Jessie J. Smith,et al. Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline. , 2020, 2004.13715.

[2] Johannes L. Schönberger,et al. SciPy 1.0: fundamental algorithms for scientific computing in Python , 2019, Nature Methods.

[3] Danah Boyd,et al. Fairness and Abstraction in Sociotechnical Systems , 2019, FAT.

[4] Piotr Sapiezynski,et al. Quantifying the Impact of User Attentionon Fair Group Representation in Ranked Lists , 2019, WWW.

[5] Sharad Goel,et al. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[6] Krishna P. Gummadi,et al. Equity of Attention: Amortizing Individual Fairness in Rankings , 2018, SIGIR.

[7] Thorsten Joachims,et al. Fairness of Exposure in Rankings , 2018, KDD.

[8] Abolfazl Asudeh,et al. Designing Fair Ranking Schemes , 2017, SIGMOD Conference.

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11] Vitaly Shmatikov,et al. How To Break Anonymity of the Netflix Prize Dataset , 2006, ArXiv.

[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13] Latanya Sweeney,et al. Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.