Bandits Games and Clustering Foundations

This thesis takes place within the machine learning theory. In particular it focuses on three sub-domains, stochastic optimization, online learning and clustering. These subjects exist for decades, but all have been recently studied under a new perspective. For instance, bandits games now offer a unified framework for stochastic optimization and online learning. This point of view results in many new extensions of the basic game. In the first part of this thesis, we focus on the mathematical study of these extensions (as well as the classical game). On the other hand, in the second part we discuss two important theoretical concepts for clustering, namely the consistency of algorithms and the stability as a tool for model selection.

[1]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[2]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[3]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[4]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[5]  J. Rissanen Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[6]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[7]  Ulrike von Luxburg,et al.  Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions , 2009, J. Mach. Learn. Res..

[8]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[9]  Aurélien Garivier,et al.  Regret Bounds for Opportunistic Channel Access , 2009, ArXiv.

[10]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[11]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[12]  Jean-Yves Audibert,et al.  Minimax Policies for Bandits Games , 2009, COLT 2009.

[13]  Alessandro Lazaric,et al.  Hybrid Stochastic-Adversarial On-line Learning , 2009, COLT.

[14]  Ohad Shamir,et al.  On the Reliability of Clustering Stability in the Large Sample Regime , 2008, NIPS.

[15]  Filip Radlinski,et al.  Mortal Multi-Armed Bandits , 2008, NIPS.

[16]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[17]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[18]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[19]  Rémi Munos,et al.  Optimistic Planning of Deterministic Systems , 2008, EWRL.

[20]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[21]  Varun Grover,et al.  Active Learning in Multi-armed Bandits , 2008, ALT.

[22]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[23]  David Silver,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[24]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[25]  Anthony K. H. Tung,et al.  Estimating local optimums in EM algorithm over Gaussian mixture model , 2008, ICML '08.

[26]  Shai Ben-David,et al.  Relating Clustering Stability to Properties of Cluster Boundaries , 2008, COLT.

[27]  Eli Upfal,et al.  Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.

[28]  Qing Zhao,et al.  A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy , 2008, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[29]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[30]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[31]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[32]  Naftali Tishby,et al.  Model Selection and Stability in k-means Clustering , 2008, Annual Conference Computational Learning Theory.

[33]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[34]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[35]  Ulrike von Luxburg,et al.  Consistent Minimization of Clustering Objective Functions , 2007, NIPS.

[36]  Ohad Shamir,et al.  Cluster Stability for Finite Samples , 2007, NIPS.

[37]  Nan Rong,et al.  What makes some POMDP problems easy to approximate? , 2007, NIPS.

[38]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[39]  Stefanie Jegelka Statistical Learning Theory Approaches to Clustering , 2007 .

[40]  Deepayan Chakrabarti,et al.  Multi-armed bandit problems with dependent arms , 2007, ICML '07.

[41]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[42]  Shai Ben-David,et al.  Stability of k -Means Clustering , 2007, COLT.

[43]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[44]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[45]  Tamás Linder,et al.  The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[46]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[47]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[48]  Artur Czumaj,et al.  Sublinear‐time approximation algorithms for clustering via random sampling , 2007, Random Struct. Algorithms.

[49]  Shai Ben-David,et al.  A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering , 2007, Machine Learning.

[50]  R. Ostrovsky,et al.  The Effectiveness of Lloyd-Type Methods for the k-Means Problem , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[51]  Peter Auer,et al.  Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.

[52]  Stephen F. Smith,et al.  A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem , 2006, CP.

[53]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[54]  Stephen F. Smith,et al.  An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem , 2006, AAAI.

[55]  Gregory Shakhnarovich,et al.  An investigation of computational and informational limits in Gaussian mixture clustering , 2006, ICML '06.

[56]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[57]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  K. Schlag ELEVEN - Tests needed for a Recommendation , 2006 .

[59]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[60]  Q. Zhao,et al.  Decentralized cognitive mac for dynamic spectrum access , 2005, First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005..

[61]  Gilles Stoltz Incomplete information and internal regret in prediction of individual sequences , 2005 .

[62]  H. Vincent Poor,et al.  Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[63]  Gábor Lugosi,et al.  Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.

[64]  U. V. Luxburg,et al.  Towards a Statistical Theory of Clustering , 2005 .

[65]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[66]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[67]  Russell Greiner,et al.  The Budgeted Multi-armed Bandit Problem , 2004, COLT.

[68]  Avrim Blum,et al.  Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[69]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[70]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[71]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[72]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[73]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[74]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[75]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[76]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[77]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  Irini Angelidaki,et al.  Anaerobic digestion model No. 1 (ADM1) , 2002 .

[79]  Leonard Pitt,et al.  Sublinear time approximate clustering , 2001, SODA '01.

[80]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[81]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[82]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[83]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[84]  Piotr Indyk,et al.  Sublinear time algorithms for metric space problems , 1999, STOC '99.

[85]  Stephen Guattery,et al.  On the Quality of Spectral Separators , 1998, SIAM J. Matrix Anal. Appl..

[86]  Joachim M. Buhmann,et al.  Grosser Systeme Echtzeitoptimierung Schwerpunktprogramm Der Deutschen Forschungsgemeinschaft Empirical Risk Approximation: an Induction Principle for Unsupervised Learning , 2022 .

[87]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[88]  Nicolò Cesa-Bianchi,et al.  Analysis of two gradient-based algorithms for on-line regression , 1997, COLT '97.

[89]  Robert W. Chen,et al.  Bandit problems with infinitely many arms , 1997 .

[90]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[91]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[92]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[93]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[94]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[95]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[96]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[97]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[98]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[99]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[100]  Dorothea Wagner,et al.  Between Min Cut and Graph Bisection , 1993, MFCS.

[101]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[102]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[103]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[104]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[105]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[106]  J. Hartigan Statistical theory in clustering , 1985 .

[107]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[108]  Michael Randolph Garey,et al.  The complexity of the generalized Lloyd - Max problem , 1982, IEEE Trans. Inf. Theory.

[109]  J. Hartigan Consistency of Single Linkage for High-Density Clusters , 1981 .

[110]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[111]  József Fritz,et al.  Distribution-free exponential error bound for nearest neighbor pattern classification , 1975, IEEE Trans. Inf. Theory.

[112]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[113]  W. Hoeffding Probability inequalities for sum of bounded random variables , 1963 .

[114]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[115]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[116]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .