Catboost-based Framework with Additional User Information for Social Media Popularity Prediction

In this paper, a Catboost-based framework is proposed to predict social media popularity. The framework is constituted by two components: feature representation and Catboost training. In the component of feature representation, numerical features are directly used, while categorical features are converted into numerical features by a method of order target statistics in Catboost. Besides, some additional user information is also tracked to enrich the feature space. In the other component, Catboost is adopted as the regression model which is trained by using post-related, user-related and additional user information. Moreover, to make full use of the dataset for model training, a dataset augmentation strategy based on pseudo labels is proposed. This strategy involves in two-stage training. In the first stage, it trains a first-stage model that is used to label the test set as pseudo labeled. In the next stage, a final model is trained based on the new training set that includes original validation set and the pseudo labeled test set. The proposed method achieves the 2nd place in the leader board of the Grand Challenge of Social Media Prediction.

[1]  Kai-Lung Hua,et al.  Popularity Meter: An Influence- and Aesthetics-aware Social Media Popularity Predictor , 2017, ACM Multimedia.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Michal Ferov,et al.  Enhancing LambdaMART Using Oblivious Trees , 2016, ArXiv.

[4]  Yongdong Zhang,et al.  Unfolding Temporal Dynamics: Predicting Social Media Popularity Using Multi-scale Temporal Decomposition , 2016, AAAI.

[5]  Chih-Chung Hsu,et al.  An Iterative Refinement Approach for Social Media Headline Prediction , 2018, ACM Multimedia.

[6]  Dong Wang,et al.  Click-through Prediction for Advertising in Twitter Timeline , 2015, KDD.

[7]  Wenyin Liu,et al.  A Hybrid Model Combining Convolutional Neural Network with XGBoost for Predicting Social Media Popularity , 2017, ACM Multimedia.

[8]  Changsheng Xu,et al.  Towards SMP Challenge: Stacking of Diverse Models for Social Image Popularity Prediction , 2017, ACM Multimedia.

[9]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[10]  Wenyin Liu,et al.  An Effective Text-based Characterization Combined with Numerical Features for Social Media Headline Prediction , 2018, ACM Multimedia.

[11]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[12]  Chun Wang,et al.  Social Media Prediction Based on Residual Learning and Random Forest , 2017, ACM Multimedia.

[13]  Zhenguo Yang,et al.  Random Forest Exploiting Post-related and User-related Features for Social Media Popularity Prediction , 2018, ACM Multimedia.

[14]  Wei Zhang,et al.  Combining Multiple Features for Image Popularity Prediction in Social Media , 2017, ACM Multimedia.

[15]  Meng Zhang,et al.  Multi-feature Fusion for Predicting Social Media Popularity , 2017, ACM Multimedia.

[16]  Yongdong Zhang,et al.  Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks , 2017, IJCAI.

[17]  Yongdong Zhang,et al.  Time Matters: Multi-scale Temporalization of Social Media Popularity , 2016, ACM Multimedia.