Accurate and early prediction of user lifespan in an online video-on-demand system

Online video on demand (VoD) service is prevailing. Prediction of user lifespan in a VoD system benefits the service providers to characterize churn risk of users and manage to retain them. A systematical study on this problem is desired but absent in literature. We address this problem based on a large-scale dataset of user watching behavior from PPTV, one of the largest online VoD systems in China. The dataset is measured for 27 weeks and involves more than 10 million users. We analyze user watching behavior and preference in their lifespans and have some interesting observations. During user lifespans, unlike some user activity metrics such as the visiting frequency, the number of views and the finishing ratio that vary following inverted U-shaped curves, a user's preference for popular video contents, named the Popular Video Preference (PVP), decreases with time. As many users left the system very quickly, e.g. after only one week, it is necessary to make early prediction of whether a user will have a long lifespan based on short instead of long behavior history. We propose to apply machine learning methods to make this prediction based on user first-week behavior records. Experimental results show that the most relevant feature is the visit frequency; the PVP feature helps to improve the F1-score of prediction by 8.8% and reaches 0.74 at the best. Our proposed model and the PVP metric are helpful for VoD service providers to predict user lifespan and take measures to retain users at their early stage in the system.