Towards Scalable Personalization

The ever-growing amount of online information calls for Personalization. Among the various personalization systems, recommenders have become increasingly popular in recent years. Recommenders typically use collaborative filtering to suggest the most relevant items to their users. The most prominent challenges underlying personalization are: scalability, privacy, and heterogeneity. Scalability is challenging given the growing rate of the Internet and its dynamics, both in terms of churn (i.e., users might leave/join at any time) and changes of user interests over time. Privacy is also a major concern as users might be reluctant to expose their profiles to unknown parties (e.g., other curious users), unless they have an incentive to significantly improve their navigation experience and sufficient guarantees about their privacy. Heterogeneity poses a major technical difficulty because, to be really meaningful, the profiles of users should be extracted from a number of their navigation activities (heterogeneity of source domains) and represented in a form that is general enough to be leveraged in the context of other applications (heterogeneity of target domains). In this dissertation, we address the above-mentioned challenges. For scalability, we introduce democratization and incrementality. Our democratization approach focuses on iteratively offloading the computationally expensive tasks to the user devices (via browsers or applications). This approach achieves scalability by employing the devices of the users as additional resources and hence the throughput of the approach (i.e., number of updates per unit time) scales with the number of users. Our incrementality approach deals with incremental similarity metrics employing either explicit (e.g., ratings) or implicit (e.g., consumption sequences for users) feedback. This approach achieves scalability by reducing the time complexity of each update, and thereby enabling higher throughput. We tackle the privacy concerns from twoperspectives, i.e., anonymity fromeither other curious users (user-level privacy) or the service provider (system-level privacy). We strengthen the notion of differential privacy in the context of recommenders by introducing distance-based differential privacy (D2P) which prevents curious users from even guessing any category (e.g., genre) in which a user might be interested in. We also briefly introduce a recommender (X-REC) which employs uniform user sampling technique to achieve user-level privacy and an efficient homomorphic encryption scheme (X-HE) to achieve system-level privacy.

[1]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[2]  Min Zhao,et al.  Online evolutionary collaborative filtering , 2010, RecSys '10.

[3]  John Riedl,et al.  Recommender systems: from algorithms to user experience , 2012, User Modeling and User-Adapted Interaction.

[4]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[5]  Parthasarathy Ranganathan,et al.  Energy Efficiency: The New Holy Grail of Data Management Systems Research , 2009, CIDR.

[6]  Andrey Gulin,et al.  Winning The Transfer Learning Track of Yahoo!'s Learning To Rank Challenge with YetiRank , 2010, Yahoo! Learning to Rank Challenge.

[7]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[8]  Athina Markopoulou,et al.  Predictive Blacklisting as an Implicit Recommendation System , 2009, 2010 Proceedings IEEE INFOCOM.

[9]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[10]  Vitaly Shmatikov,et al.  2011 IEEE Symposium on Security and Privacy “You Might Also Like:” Privacy Risks of Collaborative Filtering , 2022 .

[11]  Stefan Katzenbeisser,et al.  Privacy-Preserving Recommendation Systems for Consumer Healthcare Services , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[12]  Robin van Meteren Using Content-Based Filtering for Recommendation , 2000 .

[13]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[14]  Young Park,et al.  An empirical study on effectiveness of temporal information as implicit ratings , 2009, Expert Syst. Appl..

[15]  Markus Zanker,et al.  Time Filtering for Better Recommendations with Small and Sparse Rating Matrices , 2007, WISE.

[16]  Frank Hopfgartner,et al.  Overview of CLEF NewsREEL 2015: News Recommendation Evaluation Lab , 2015, CLEF.

[17]  Benjamin Moseley,et al.  Fast clustering using MapReduce , 2011, KDD.

[18]  Yijun Huang,et al.  Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[19]  Liliana Ardissono,et al.  A multi-agent infrastructure for developing personalized web-based systems , 2005, TOIT.

[20]  John Riedl,et al.  Collaborative Filtering Recommender Systems , 2011, Found. Trends Hum. Comput. Interact..

[21]  Paolo Cremonesi,et al.  Cross-Domain Recommender Systems , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[22]  Anne-Marie Kermarrec,et al.  HyRec: leveraging browsers for scalable recommenders , 2014, Middleware.

[23]  Dinesh Manocha,et al.  Bi-level Locality Sensitive Hashing for k-Nearest Neighbor Computation , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[24]  Roberto Turrin,et al.  A Recommender System for an IPTV Service Provider: a Real Large-Scale Production Environment , 2011, Recommender Systems Handbook.

[25]  Wenliang Du,et al.  Privacy-preserving collaborative filtering using randomized perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[26]  Bipin Joshi Multithreading in Web Pages Using Web Workers , 2012 .

[27]  Dan Alistarh,et al.  QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent , 2016, ArXiv.

[28]  Martha Larson,et al.  Stream-Based Recommendations: Online and Offline Evaluation as a Service , 2015, CLEF.

[29]  Martha Larson,et al.  Cross-Domain Collaborative Filtering with Factorization Machines , 2014, ECIR.

[30]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[31]  Gang Chen,et al.  On Building and Updating Distributed LSI for P2P Systems , 2005, ISPA Workshops.

[32]  Pasquale Savino,et al.  Approximate similarity search in metric spaces using inverted files , 2008, Infoscale.

[33]  Douglas W. Oard,et al.  Implicit Feedback for Recommender Systems , 1998 .

[34]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[35]  Michael J. Carey,et al.  Extending Map-Reduce for Efficient Predicate-Based Sampling , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[36]  Alejandro Bellogín,et al.  Relating Personality Types with User Preferences in Multiple Entertainment Domains , 2013, UMAP Workshops.

[37]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[38]  John F. Canny,et al.  Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[39]  John Riedl,et al.  RecBench , 2011, Proc. VLDB Endow..

[40]  Maik Thiele,et al.  Setting Goals and Choosing Metrics for Recommender System Evaluations , 2011 .

[41]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[42]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[43]  Jiajin Le,et al.  A Privacy-Preserving Book Recommendation Model Based on Multi-agent , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[44]  Anne-Marie Kermarrec,et al.  The Gossple Anonymous Social Network , 2010, Middleware.

[45]  Ian Soboroff,et al.  Collaborative filtering and the generalized vector space model (poster session) , 2000, SIGIR '00.

[46]  Hongxia Jin,et al.  Private Incremental Regression , 2017, PODS.

[47]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[48]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[49]  Pengtao Xie,et al.  Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.

[50]  Yanxiang Huang,et al.  TencentRec: Real-time Stream Recommendation in Practice , 2015, SIGMOD Conference.

[51]  Floriana Esposito,et al.  Learning in Probabilistic Graphs Exploiting Language-Constrained Patterns , 2012, NFMCP.

[52]  Junjie Yao,et al.  TeRec: A Temporal Recommender System Over Tweet Stream , 2013, Proc. VLDB Endow..

[53]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[54]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[55]  Gautam Das,et al.  Privacy Implications of Database Ranking , 2015, Proc. VLDB Endow..

[56]  Saeed Shahrivari,et al.  Beyond Batch Processing: Towards Real-Time and Streaming Big Data , 2014, Comput..

[57]  Zhengping Qian,et al.  MadLINQ: large-scale distributed matrix computation for the cloud , 2012, EuroSys '12.

[58]  Licia Capra,et al.  Temporal collaborative filtering with adaptive neighbourhoods , 2009, SIGIR.

[59]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[60]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[61]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[62]  David Dupplaw,et al.  The semantic logger: supporting service building from personal context , 2006, CARPE '06.

[63]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[64]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[65]  Tianqing Zhu,et al.  An effective privacy preserving algorithm for neighborhood-based collaborative filtering , 2014, Future Gener. Comput. Syst..

[66]  Oren Barkan,et al.  ITEM2VEC: Neural item embedding for collaborative filtering , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[67]  Bradley N. Miller,et al.  PocketLens: Toward a personal recommender system , 2004, TOIS.

[68]  Niv Ahituv,et al.  Processing encrypted data , 1987, CACM.

[69]  Ji Liu,et al.  Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.

[70]  Chunming Rong,et al.  Fast algorithms to evaluate collaborative filtering recommender systems , 2016, Knowl. Based Syst..

[71]  Bradley N. Miller,et al.  Social Information Filtering : Algorithms for Automating “ Word of Mouth , ” , 2017 .

[72]  Martha Larson,et al.  Tags as bridges between domains: improving recommendation with tag-induced cross-domain collaborative filtering , 2011, UMAP'11.

[73]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[74]  Zekeriya Erkin,et al.  Efficiently computing private recommendations , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[75]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[76]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[77]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[78]  Rachid Guerraoui,et al.  Sequences, Items And Latent Links: Recommendation With Consumed Item Packs , 2017, ArXiv.

[79]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[80]  John Riedl,et al.  An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms , 2002, Information Retrieval.

[81]  Yi Zhang,et al.  Fuzzy trust recommendation based on collaborative filtering for mobile ad-hoc networks , 2008, 2008 33rd IEEE Conference on Local Computer Networks (LCN).

[82]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[83]  Anne-Marie Kermarrec,et al.  D2P: Distance-Based Differential Privacy in Recommenders , 2015, Proc. VLDB Endow..

[84]  Anne-Marie Kermarrec,et al.  I Know Nothing about You But Here is What You Might Like , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[85]  Maarten van Steen,et al.  Epidemic-Style Management of Semantic Overlays for Content-Based Searching , 2005, Euro-Par.

[86]  Anne-Marie Kermarrec,et al.  Heterogeneous Recommendations: What You Might Like To Read After Watching Interstellar , 2017, Proc. VLDB Endow..

[87]  Martin Szomszor,et al.  Comparison of implicit and explicit feedback from an online music recommendation service , 2010, HetRec '10.

[88]  David Novak,et al.  Metric Index: An Efficient and Scalable Solution for Similarity Search , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[89]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[90]  Nitesh V. Chawla,et al.  A Private and Reliable Recommendation System for Social Networks , 2010, 2010 IEEE Second International Conference on Social Computing.

[91]  Rachid Guerraoui,et al.  Capturing the Moment: Lightweight Similarity Computations , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[92]  Federica Cena,et al.  User identification for cross-system personalisation , 2009, Inf. Sci..

[93]  Giuseppe M. L. Sarnè,et al.  MUADDIB: A distributed recommender system supporting device adaptivity , 2009, TOIS.

[94]  Mouzhi Ge,et al.  Beyond accuracy: evaluating recommender systems by coverage and serendipity , 2010, RecSys '10.

[95]  Shilad Sen,et al.  Rating: how difficult is it? , 2011, RecSys '11.

[96]  George Karypis,et al.  FISM: factored item similarity models for top-N recommender systems , 2013, KDD.

[97]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[98]  Fernando Díez,et al.  Simple time-biased KNN-based recommendations , 2010, CAMRa '10.

[99]  Hao Ma,et al.  An experimental study on implicit social recommendation , 2013, SIGIR.

[100]  John Riedl,et al.  An Algorithmic Framework for Performing Collaborative Filtering , 1999, SIGIR Forum.

[101]  Ting Li,et al.  Willing to pay for quality personalization? Trade-off between quality and privacy , 2012, Eur. J. Inf. Syst..

[102]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[103]  Arnd Kohrs,et al.  Clustering for collaborative filtering applications , 1999 .

[104]  Yihong Gong,et al.  Fast nonparametric matrix factorization for large-scale collaborative filtering , 2009, SIGIR.

[105]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[106]  Tsvi Kuflik,et al.  Cross-Domain Mediation in Collaborative Filtering , 2007, User Modeling.

[107]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[108]  Tsvi Kuflik,et al.  Distributed collaborative filtering with domain specialization , 2007, RecSys '07.

[109]  Tommaso Di Noia,et al.  Top-N recommendations from implicit feedback leveraging linked open data , 2013, IIR.

[110]  Patrick Gallinari,et al.  Predicting most rated items in Weekly Recommendation with temporal regression , 2010, CAMRa '10.

[111]  Xue Li,et al.  Time weight collaborative filtering , 2005, CIKM '05.

[112]  Marten van Dijk,et al.  On the Impossibility of Cryptography Alone for Privacy-Preserving Cloud Computing , 2010, HotSec.

[113]  Tianqing Zhu,et al.  Differential privacy for neighborhood-based Collaborative Filtering , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[114]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[115]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[116]  Federica Cena,et al.  User model interoperability: a survey , 2011, User Modeling and User-Adapted Interaction.

[117]  Gordan Jezic,et al.  Implementation of Agent-Based Games Recommendation System on Mobile Platforms , 2014, KES-AMSTA.

[118]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[119]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[120]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[121]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[122]  Antonis Loizou,et al.  How to recommend music to film buffs: enabling the provision of recommendations from multiple domains , 2009 .

[123]  Dola Barua Location-Based Services for Mobile Telephony: a study of Users' privacy concerns , 2015 .

[124]  Donald T. Davis,et al.  Privacy and Security Issues in E-Commerce , 2003 .

[125]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[126]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[127]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[128]  Naren Ramakrishnan,et al.  Privacy Risks in Recommender Systems , 2001, IEEE Internet Comput..

[129]  Monica Bonett Personalization of Web Services: Opportunities and Challenges , 2001 .

[130]  Fillia Makedon,et al.  Deriving Private Information from Randomly Perturbed Ratings , 2006, SDM.

[131]  Hyung Joon Kook,et al.  Profiling Multiple Domains of User Interests and Using Them for Personalized Web Support , 2005, ICIC.

[132]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[133]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[134]  Egor Samosvat,et al.  BoostJet: Towards Combining Statistical Aggregates with Neural Embeddings for Recommendations , 2017, ArXiv.

[135]  Lior Rokach,et al.  Facebook single and cross domain data for recommendation systems , 2013, User Modeling and User-Adapted Interaction.

[136]  Oscar Fontenla-Romero,et al.  Online Machine Learning , 2024, Machine Learning: Foundations, Methodologies, and Applications.

[137]  Beat Signer,et al.  Spatio-Temporal Proximity as a basis for Collaborative Filtering in Mobile Environments , 2006, UMICS.

[138]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[139]  Jiawei Jiang,et al.  Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.

[140]  Ke Wang,et al.  Scalable collaborative filtering using incremental update and local link prediction , 2012, CIKM.

[141]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[142]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[143]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[144]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[145]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[146]  Kamal Ali,et al.  TiVo: making show recommendations using a distributed collaborative filtering architecture , 2004, KDD.

[147]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[148]  Lars Schmidt-Thieme,et al.  Real-time top-n recommendation in social streams , 2012, RecSys.

[149]  Linas Baltrunas,et al.  Towards Time-Dependant Recommendation based on Implicit Feedback , 2009 .

[150]  Sean M. McNee,et al.  Getting to know you: learning new user preferences in recommender systems , 2002, IUI '02.

[151]  Chris Schwiegelshohn,et al.  Efficient Similarity Search in Dynamic Data Streams , 2016, ArXiv.

[152]  Keqiu Li,et al.  Energy Consumption in Cloud Computing Data Centers , 2014, CloudCom 2014.

[153]  Evangelia Christakopoulou,et al.  HOSLIM: Higher-Order Sparse LInear Method for Top-N Recommender Systems , 2014, PAKDD.

[154]  Benny Pinkas,et al.  FairplayMP: a system for secure multi-party computation , 2008, CCS.

[155]  Daniel Lemire,et al.  Slope One Predictors for Online Rating-Based Collaborative Filtering , 2007, SDM.

[156]  Chen Wang,et al.  Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics , 2015, Proc. VLDB Endow..

[157]  Gustavo González,et al.  A Multi-agent Smart User Model for Cross-domain Recommender Systems , 2005 .

[158]  Francesco Ricci,et al.  Context-based splitting of item ratings in collaborative filtering , 2009, RecSys '09.

[159]  Martin Szomszor,et al.  Correlating user profiles from multiple folksonomies , 2008, Hypertext.

[160]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[161]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[162]  Nuria Oliver,et al.  I Like It... I Like It Not: Evaluating User Ratings Noise in Recommender Systems , 2009, UMAP.

[163]  Julian J. McAuley,et al.  Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[164]  Jianmin Wang,et al.  Inferring Continuous Dynamic Social Influence and Personal Preference for Temporal Behavior Prediction , 2014, Proc. VLDB Endow..

[165]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[166]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[167]  Johan A. Pouwelse,et al.  Tribler: P2P media search and sharing , 2011, MM '11.

[168]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[169]  Xin Wang,et al.  Predicting Trust Relationships in Social Networks Based on WKNN , 2015, J. Softw..

[170]  James A. Hendler,et al.  Trust Networks on the Semantic Web , 2003, WWW.

[171]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[172]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[173]  Stavros Papadopoulos,et al.  Differentially Private Event Sequences over Infinite Streams , 2014, Proc. VLDB Endow..