GPU Accelerated Feature Engineering and Training for Recommender Systems

In this paper we present our 1st place solution of the RecSys Challenge 2020 which focused on the prediction of user behavior, specifically the interaction with content, on this year’s dataset from competition host Twitter. Our approach achieved the highest score in seven of the eight metrics used to calculate the final leaderboard position. The 200 million tweet dataset required significant computation to do feature engineering and prepare the dataset for modelling, and our winning solution leveraged several key tools in order to accelerate our training pipeline. Within the paper we describe our exploratory data analysis (EDA) and training, the final features and models used, and the acceleration of the pipeline. Our final implementation runs entirely on GPU including feature engineering, preprocessing, and training the models. From our initial single threaded efforts in Pandas which took over ten hours we were able to accelerate feature engineering, preprocessing and training to two minutes and eighteen seconds, an end to end speedup of over 280x, using a combination of RAPIDS cuDF, Dask, UCX and XGBoost on a single machine with four NVIDIA V100 GPUs. Even when compared to heavily optimized code written later using Dask and Pandas on a 20 core CPU, our solution was still 25x faster. The acceleration of our pipeline was critical in our ability to quickly perform EDA which led to the discovery of a range of effective features used in the final solution, which is provided as open source [16].