Parallel Large-Scale Neural Network Training For Online Advertising

Neural networks have shown great successes in many fields. Due to the complexity of the training pipeline, however, using them in an industrial setting is challenging. In online advertising, the complexity arises from the immense size of the training data, and the dimensionality of the sparse feature space (both can be hundreds of billions). To tackle these challenges, we built TrainSparse (TS), a system that parallelizes the training of neural networks with a focus on efficiently handling large-scale sparse features. In this paper, we present the design and implementation of TS, and show the effectiveness of the system by applying it to predict the ad conversion rate (pCVR), one of the key problems in online advertising. We also compare several methods for dimensionality reduction on sparse features in the pCVR task. Experiments on real-world industry data show that TS achieves outstanding performance and scalability.

[1]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[3]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[4]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[5]  Tim Kraska,et al.  MLI: An API for Distributed Machine Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  Torsten Hoefler,et al.  Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..

[7]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[8]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[9]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[10]  Jun Wang,et al.  Product-Based Neural Networks for User Response Prediction , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[11]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[12]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[13]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[14]  Deepak Agarwal,et al.  LASER: a scalable response prediction platform for online advertising , 2014, WSDM.

[15]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[16]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[17]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.