I Know You'll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application

As online platforms are striving to get more users, a critical challenge is user churn, which is especially concerning for new users. In this paper, by taking the anonymous large-scale real-world data from Snapchat as an example, we develop ClusChurn , a systematic two-step framework for interpretable new user clustering and churn prediction, based on the intuition that proper user clustering can help understand and predict user churn. Therefore, ClusChurn firstly groups new users into interpretable typical clusters, based on their activities on the platform and ego-network structures. Then we design a novel deep learning pipeline based on LSTM and attention to accurately predict user churn with very limited initial behavior data, by leveraging the correlations among users' multi- dimensional activities and the underlying user types. ClusChurn is also able to predict user types, which enables rapid reactions to different types of user churn. Extensive data analysis and experiments show that ClusChurn provides valuable insight into user behaviors, and achieves state-of-the-art churn prediction performance. The whole framework is deployed as a data analysis pipeline, delivering real-time data analysis and prediction results to multiple relevant teams for business intelligence uses. It is also general enough to be readily adopted by any online systems with user behavior data.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[3]  Richang Hong,et al.  Point-of-Interest Recommendations: Learning Potential Check-ins from Friends , 2016, KDD.

[4]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[5]  Huan Liu,et al.  Exploiting Local and Global Social Context for Recommendation , 2013, IJCAI.

[6]  Jasmine Novak,et al.  PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .

[7]  Jiawei Han,et al.  Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation , 2017, KDD.

[8]  Jieping Ye,et al.  Did You Enjoy the Ride? Understanding Passenger Experience via Heterogeneous Network Embedding , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[9]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[10]  Sandra E. Moriarty,et al.  Advertising: Principles and Practice , 1989 .

[11]  Silvio Lattanzi,et al.  Ego-net Community Mining Applied to Friend Suggestion , 2015, Proc. VLDB Endow..

[12]  Geoffrey I. Webb,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[13]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Elaine Rich,et al.  User Modeling via Stereotypes , 1998, Cogn. Sci..

[16]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[17]  Jure Leskovec,et al.  Understanding Behaviors that Lead to Purchasing: A Case Study of Pinterest , 2016, KDD.

[18]  Alexander J. Smola,et al.  Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data , 2017, ICML.

[19]  Susan Gauch,et al.  Personalizing Search Based on User Search Histories , 2004 .

[20]  Jure Leskovec,et al.  No country for old members: user lifecycle and linguistic change in online communities , 2013, WWW.

[21]  Hanqing Lu,et al.  CONE: Community Oriented Network Embedding , 2017, ArXiv.

[22]  Ramanathan V. Guha,et al.  User Modeling for a Personal Assistant , 2015, WSDM.

[23]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[24]  Jon M. Kleinberg,et al.  Romantic partnerships and the dispersion of social ties: a network analysis of relationship status on facebook , 2013, CSCW.

[25]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[26]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[27]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[28]  Chi-Yin Chow,et al.  iGSLR: personalized geo-social location recommendation: a kernel density estimation approach , 2013, SIGSPATIAL/GIS.

[29]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[30]  Mohamed Morchid,et al.  Parallel Long Short-Term Memory for multi-stream classification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[31]  John L. Daly,et al.  Pricing for Profitability: Activity-Based Pricing for Competitive Advantage , 2001 .

[32]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[33]  Jaideep Srivastava,et al.  Churn Prediction in MMORPGs: A Social Influence Based Approach , 2009, 2009 International Conference on Computational Science and Engineering.

[34]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[35]  Lada A. Adamic,et al.  The very small world of the well-connected , 2009 .

[36]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[37]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[38]  Ting Liu,et al.  User Modeling with Neural Network for Review Rating Prediction , 2015, IJCAI.

[39]  Jiahui Liu,et al.  Personalized news recommendation based on click behavior , 2010, IUI '10.

[40]  J. Golbeck,et al.  FilmTrust: movie recommendations using trust in web-based social networks , 2006, CCNC 2006. 2006 3rd IEEE Consumer Communications and Networking Conference, 2006..

[41]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[42]  Qi Gao,et al.  Twitter-Based User Modeling for News Recommendations , 2013, IJCAI.

[43]  Herbert Arkin,et al.  Advertising : principles and practice , 1936 .

[44]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[45]  Jure Leskovec,et al.  Dynamics of bidding in a P2P lending service: effects of herding and predicting loan success , 2011, WWW.

[46]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[47]  Anthony Jameson,et al.  User Modeling and User-Adapted Interaction , 2004, User Modeling and User-Adapted Interaction.

[48]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[49]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[50]  Lin Zhong,et al.  Bi-directional Joint Inference for User Links and Attributes on Large Social Graphs , 2017, WWW.

[51]  Xin Yao,et al.  A novel evolutionary data mining algorithm with applications to churn prediction , 2003, IEEE Trans. Evol. Comput..

[52]  Zhiting Hu,et al.  Dynamic User Modeling in Social Media Systems , 2015, TOIS.

[53]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[54]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[55]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[56]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[57]  Subbarao Kambhampati,et al.  What We Instagram: A First Analysis of Instagram Photo Content and User Types , 2014, ICWSM.