You Are What You Watch and When You Watch: Inferring Household Structures From IPTV Viewing Data

What you watch and when you watch say a lot about you, and such information at the aggregated level across a user population obviously provides significant insights for social and commercial applications. In this paper, we propose a model for inferring household structures based on analyzing users' viewing behaviors in Internet Protocol Television (IPTV) systems. We emphasize extracting features of viewing behaviors based on the dynamic of watching time and TV programs and training a classifier for inferring household structures according to the features. In the training phase, instead of merely using the limited labeled samples, we apply semisupervised learning strategy to obtain a graph-based model for classifying household structures from users' features. We test the proposed model on China Telecom IPTV data and demonstrate its utility in census research and system simulation. The demographic characteristics inferred by our approach match well with the population census data of Shanghai, and the inference of household structures of IPTV users gives encouraging results compared with the ground truth obtained by surveys, which opens the door for leveraging IPTV viewing data as a complementary way for time- and resource-consuming census tracking. On the other hand, the proposed model can also synthesize trace data for the simulations of IPTV systems, which provides us with a new strategy for system simulation.

[1]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Pablo Rodriguez,et al.  Watching television over an IP network , 2008, IMC '08.

[5]  Seungjoon Lee,et al.  Modeling channel popularity dynamics in a large IPTV system , 2009, SIGMETRICS '09.

[6]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[7]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Wei Chu,et al.  Personalized recommendation on dynamic content using predictive bilinear models , 2009, WWW '09.

[10]  Amin Vahdat,et al.  MediSyn: a synthetic streaming media service workload generator , 2003, NOSSDAV '03.

[11]  Keith W. Ross,et al.  A Measurement Study of a Large-Scale P2P IPTV System , 2007, IEEE Transactions on Multimedia.

[12]  Kang-Won Lee,et al.  Planning and Managing the IPTV Service Deployment , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[13]  Nello Cristianini,et al.  Convex Methods for Transduction , 2003, NIPS.

[14]  Allen Y. Yang,et al.  Robust Statistical Estimation and Segmentation of Multiple Subspaces , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[15]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[16]  Junchi Yan,et al.  Visual Saliency Detection via Sparsity Pursuit , 2010, IEEE Signal Processing Letters.

[17]  Ben Y. Zhao,et al.  Understanding user behavior in large-scale video-on-demand systems , 2006, EuroSys.

[18]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[19]  Enhong Chen,et al.  An effective approach for mining mobile user habits , 2010, CIKM.

[20]  Donald E. Smith IP TV Bandwidth Demand: Multicast and Channel Surfing , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[21]  Eric Hsueh-Chan Lu,et al.  Mining Cluster-Based Mobile Sequential Patterns in Location-Based Service Environments , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[22]  Enhong Chen,et al.  A habit mining approach for discovering similar mobile users , 2012, WWW.

[23]  Gholamreza Haffari,et al.  Analysis of Semi-Supervised Learning with the Yarowsky Algorithm , 2007, UAI.

[24]  Seungjoon Lee,et al.  Modeling user activities in a large IPTV system , 2009, IMC '09.

[25]  René Vidal,et al.  Sparse subspace clustering , 2009, CVPR.

[26]  Hui Xiong,et al.  An Unsupervised Approach to Modeling Personalized Contexts of Mobile Users , 2010, ICDM.

[27]  G. Michailidis,et al.  An Iterative Algorithm for Extending Learners to a Semi-Supervised Setting , 2008 .

[28]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[30]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[31]  Vincent S. Tseng,et al.  Efficient mining and prediction of user behavior patterns in mobile web systems , 2006, Inf. Softw. Technol..