Identifying Users behind Shared Accounts in Online Streaming Services

Online streaming services are prevalent. Major service providers, such as Netflix (for movies) and Spotify (for music), usually have a large customer base. More often than not, users may share an account. This has attracted increasing attention recently, as account sharing not only compromises the service provider's financial interests but also impairs the performance of recommendation systems and consequently the quality of service provided to the users. To address this issue, this paper focuses on the problem of user identification in shared accounts. Our goal is three-fold: (1) Given an account, along with its historical session logs, we identify a set of users who share such account; (2) Given a new session issued by an account, we find the corresponding user among the identified users of such account; (3) We aim to boost the performance of item recommendation by user identification. While the mapping between users and accounts is unknown, we propose an unsupervised learning-based framework, Session-based Heterogeneous graph Embedding for User Identification (SHE-UI), to differentiate and model the preferences of users in an account, and to group sessions by these users. In SHE-UI, a heterogeneous graph is constructed to represent items such as songs and their available metadata such as artists, genres, and albums. An item-based session embedding technique is proposed using a normalized random walk in the heterogeneous graph. Our experiments conducted on two large-scale music streaming datasets, Last.fm and KKBOX, show that SHE-UI not only accurately identifies users, but also significantly improves the performance of item recommendation over the state-of-the-art methods.

[1]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  David A. McAllester,et al.  Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence , 2009, UAI 2009.

[4]  Christos Faloutsos,et al.  BIRDNEST: Bayesian Inference for Ratings-Fraud Detection , 2015, SDM.

[5]  Sumit Shekhar,et al.  Experience Individualization on Online TV Platforms through Persona-based Account Decomposition , 2016, ACM Multimedia.

[6]  Bartłomiej Twardowski,et al.  Modelling Contextual Information in Session-Aware Recommender Systems with Neural Networks , 2016, RecSys.

[7]  Ryen W. White,et al.  Enhancing personalization via search activity attribution , 2014, SIGIR.

[8]  Bart Goethals,et al.  Top-N Recommendation for Shared Accounts , 2015, RecSys.

[9]  Niels Landwehr,et al.  Modeling interleaved hidden processes , 2008, ICML '08.

[10]  Martha Larson,et al.  CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering , 2012, RecSys.

[11]  Yafeng Zhao,et al.  Passenger Prediction in Shared Accounts for Flight Service Recommendation , 2016, APSCC.

[12]  Alexander J. Smola,et al.  Improving maximum margin matrix factorization , 2008, Machine Learning.

[13]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[14]  Òscar Celma,et al.  Music recommendation and discovery in the long tail , 2008 .

[15]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[16]  Eshcar Hillel,et al.  Watch-It-Next: A Contextual TV Recommendation System , 2015, ECML/PKDD.

[17]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[18]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[19]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[22]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[23]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[24]  David M. Blei,et al.  Content-based recommendations with Poisson factorization , 2014, NIPS.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Qinmin Hu,et al.  Adaptive Temporal Model for IPTV Recommendation , 2015, WAIM.

[27]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[28]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[29]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[30]  David M. Blei,et al.  Bayesian Nonparametric Poisson Factorization for Recommendation Systems , 2014, AISTATS.

[31]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  Balázs Hidasi,et al.  General factorization framework for context-aware recommendations , 2014, Data Mining and Knowledge Discovery.

[34]  Òscar Celma,et al.  Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space , 2010 .

[35]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[36]  Qiang Cao,et al.  Uncovering Large Groups of Active Malicious Accounts in Online Social Networks , 2014, CCS.

[37]  Ryen W. White,et al.  Personalizing Search on Shared Devices , 2015, SIGIR.

[38]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[39]  Ryen W. White,et al.  From devices to people: attribution of search activity in multi-user settings , 2014, WWW.

[40]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[41]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[42]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[43]  D. Goodin The cambridge dictionary of statistics , 1999 .

[44]  Liang He,et al.  User Identification within a Shared Account: Improving IP-TV Recommender Performance , 2014, ADBIS.

[45]  Stratis Ioannidis,et al.  Guess Who Rated This Movie: Identifying Users Through Subspace Clustering , 2012, UAI.

[46]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.