Data-Driven Recommender Systems

This document is about some scalable and reliable methods for recommender systems from a machine learner point of view. In particular it adresses some difficulties from the non stationary case.

[1]  Naoki Abe,et al.  Learning to Optimally Schedule Internet Banner Advertisements , 1999, ICML.

[2]  Deepayan Chakrabarti,et al.  Bandits for Taxonomies: A Model-based Approach , 2007, SDM.

[3]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[4]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[5]  Boris Ryabko,et al.  Nonparametric Statistical Inference for Ergodic Processes , 2010, IEEE Transactions on Information Theory.

[6]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[7]  Ioannis Kontoyiannis,et al.  Prefixes and the entropy rate for long-range sources , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[8]  Benjamin Weiss,et al.  How Sampling Reveals a Process , 1990 .

[9]  Zaïd Harchaoui,et al.  Kernel Change-point Analysis , 2008, NIPS.

[10]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[11]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[12]  Andrei Z. Broder,et al.  Estimating rates of rare events at multiple resolutions , 2007, KDD '07.

[13]  Denis Bosq,et al.  Nonparametric statistics for stochastic processes , 1996 .

[14]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[15]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[16]  Deepak Agarwal,et al.  Spatio-temporal models for estimating click-through rate , 2009, WWW '09.

[17]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[18]  Qiang Yang,et al.  Transfer Learning in Collaborative Filtering for Sparsity Reduction , 2010, AAAI.

[19]  Terrence M. Adams,et al.  Uniform Approximation of Vapnik-Chervonenkis Classes , 2010, 1010.4515.

[20]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[21]  Bee-Chung Chen,et al.  Explore/Exploit Schemes for Web Content Optimization , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[22]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[23]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[24]  Percy Liang,et al.  Tensor Factorization via Matrix Factorization , 2015, AISTATS.

[25]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[26]  Wei Chu,et al.  Probabilistic Models for Incomplete Multi-dimensional Arrays , 2009, AISTATS.

[27]  Anonymous Author Robust Reductions from Ranking to Classification , 2006 .

[28]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[29]  Wei Li,et al.  Exploitation and exploration in a performance based contextual advertising system , 2010, KDD.

[30]  Amin Saberi,et al.  Allocating online advertisement space with unreliable estimates , 2007, EC '07.

[31]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[32]  Naoki Abe,et al.  Improvements to the Linear Programming Based Scheduling of Web Advertisements , 2005, Electron. Commer. Res..

[33]  Boris Ryabko,et al.  Compression-Based Methods for Nonparametric Prediction and Estimation of Some Characteristics of Time Series , 2009, IEEE Transactions on Information Theory.

[34]  Daniil Ryabko,et al.  Discrimination Between B-Processes is Impossible , 2010 .

[35]  Prateek Jain,et al.  Provable Tensor Factorization with Missing Data , 2014, NIPS.

[36]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[37]  John Odenckantz,et al.  Nonparametric Statistics for Stochastic Processes: Estimation and Prediction , 2000, Technometrics.

[38]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[39]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2012, J. Mach. Learn. Res..

[40]  John Langford,et al.  Predicting Conditional Quantiles via Reduction to Classification , 2006, UAI.

[41]  Xuerui Wang,et al.  Click-Through Rate Estimation for Rare Events in Online Advertising , 2011 .

[42]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[43]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[44]  R. Fortet,et al.  Convergence de la répartition empirique vers la répartition théorique , 1953 .

[45]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[46]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[47]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[48]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[49]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[50]  D. Ryabko Testing composite hypotheses about discrete ergodic processes , 2012 .

[51]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[52]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .

[53]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[54]  Jaakko Astola,et al.  Universal Codes as a Basis for Time Series Testing , 2006, ArXiv.

[55]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[56]  Mahesh Kumar,et al.  Clustering seasonality patterns in the presence of errors , 2002, KDD.

[57]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[58]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[59]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[60]  Azadeh Khaleghi,et al.  Locating Changes in Highly Dependent Data with Unknown Number of Change Points , 2012, NIPS.

[61]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[62]  Philippe Preux,et al.  Advertising Campaigns Management: Should We Be Greedy? , 2010, 2010 IEEE International Conference on Data Mining.

[63]  P. Billingsley,et al.  Ergodic theory and information , 1966 .

[64]  Chia-Hui Chang,et al.  Sentiment-oriented contextual advertising , 2009, Knowledge and Information Systems.

[65]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[66]  Azadeh Khaleghi,et al.  Online Clustering of Processes , 2012, AISTATS.

[67]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[68]  Azadeh Khaleghi,et al.  Nonparametric multiple change point estimation in highly dependent time series , 2012, Theor. Comput. Sci..

[69]  Philippe Preux,et al.  User Engagement as Evaluation: a Ranking or a Regression Problem? , 2014, RecSysChallenge '14.

[70]  Robert M. Gray,et al.  Probability, Random Processes, And Ergodic Properties , 1987 .

[71]  Daniil Ryabko On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem , 2011, J. Mach. Learn. Res..

[72]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[73]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[74]  Benjamin Weiss,et al.  A note on prediction for discrete time series , 2012, Kybernetika.

[75]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[76]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[77]  Licia Capra,et al.  Temporal diversity in recommender systems , 2010, SIGIR.

[78]  Jdel.R. Millan,et al.  On the need for on-line learning in brain-computer interfaces , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[79]  R. Bass Convergence of probability measures , 2011 .

[80]  Eric Fournie Un Test de type Kolmogorov-smirnov pour processus de diffusion ergodiques , 1992 .

[81]  Santosh S. Vempala,et al.  A discriminative framework for clustering via similarity functions , 2008, STOC.

[82]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[83]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[84]  Bart De Schutter,et al.  Approximate reinforcement learning: An overview , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[85]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[86]  Michael I. Jordan,et al.  Learning graphical models for stationary time series , 2004, IEEE Transactions on Signal Processing.

[87]  P. Shields The Ergodic Theory of Discrete Sample Paths , 1996 .

[88]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[89]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[90]  Daniil Ryabko,et al.  A binary-classification-based metric between time-series distributions and its use in statistical and learning problems , 2013, J. Mach. Learn. Res..

[91]  M. Vidyasagar,et al.  Rates of uniform convergence of empirical means with mixing processes , 2002 .

[92]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[93]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[94]  Daniil Ryabko Uniform hypothesis testing for finite-valued stationary processes , 2014 .

[95]  Lei Li,et al.  Time Series Clustering: Complex is Simpler! , 2011, ICML.

[96]  John Langford,et al.  Exploration scavenging , 2008, ICML '08.

[97]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[98]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[99]  Istituto italiano degli attuari Giornale dell'Istituto italiano degli attuari , 1930 .

[100]  Azadeh Khaleghi,et al.  Asymptotically consistent estimation of the number of change points in highly dependent time series , 2014, ICML.

[101]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[102]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[103]  Santosh S. Vempala,et al.  Spectral Algorithms , 2009, Found. Trends Theor. Comput. Sci..

[104]  Tony Jebara,et al.  Spectral Clustering and Embedding with Hidden Markov Models , 2007, ECML.

[105]  Naoki Abe,et al.  Unintrusive Customization Techniques for Web Advertising , 1999, Comput. Networks.

[106]  Benjamin Weiss,et al.  On classifying processes , 2005, ArXiv.

[107]  Aranyak Mehta,et al.  AdWords and generalized on-line matching , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[108]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[109]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[110]  Daniil Ryabko,et al.  Reducing statistical time-series problems to binary classification , 2012, NIPS.

[111]  Arthur Flexer,et al.  A MIREX Meta-analysis of Hubness in Audio Music Similarity , 2012, ISMIR.

[112]  Ole-Christoffer Granmo,et al.  A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[113]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[114]  H. Vincent Poor,et al.  Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[115]  Philippe Preux,et al.  Managing advertising campaigns — an approximate planning approach , 2012, Frontiers of Computer Science.

[116]  Sandeep Pandey,et al.  Handling Advertisements of Unknown Quality in Search Advertising , 2006, NIPS.

[117]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[118]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[119]  Marc Boullé,et al.  MODL: A Bayes optimal discretization method for continuous attributes , 2006, Machine Learning.

[120]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[121]  Philippe Preux,et al.  ICML Exploration & Exploitation Challenge: Keep it simple! , 2011, ICML On-line Trading of Exploration and Exploitation.

[122]  Joydeep Ghosh,et al.  A Unified Framework for Model-based Clustering , 2003, J. Mach. Learn. Res..

[123]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[124]  Edward Carlstein,et al.  Nonparametric Change-Point Estimation for Data from an Ergodic Sequence , 1994 .

[125]  Boris Ryabko,et al.  Applications of Universal Source Coding to Statistical Analysis of Time Series , 2008, ArXiv.

[126]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[127]  Daniil Ryabko Clustering processes , 2010, ICML.

[128]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[129]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[130]  Boris Ryabko,et al.  Prediction of random sequences and universal coding , 2015 .

[131]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[132]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[133]  Filip Radlinski,et al.  Mortal Multi-Armed Bandits , 2008, NIPS.