Lecture Notes on Online Learning DRAFT

These lecture notes contain material presented in the Statistical Learning Theory course at UC Berkeley, Spring'08. Various parts of these notes have been discovered together with 4 Bandit problems 29 5 Minimax Results and Lower Bounds 31 6 Variance Bounds 33 7 Stochastic Approximation 35 CONTENTS CHAPTER ONE INTRODUCTION The past two decades witnessed a surge of activity on prediction and learning methods in adversarial environments. Progress on this topic has been made in various fields, with many methods independently discovered and rediscovered. In their recent book,Nicoì o Cesa-Bianchi and Gábor Lugosi [5] collected and organized many of these results under a common umbrella. We are indebted to this book for our own interest in the field, which seemed very fragmented beforeNicoì o and Gábor's effort. That being said, we feel that it might be beneficial to organize the ideas in a manner different from [5]. The purpose of these lecture notes is to stress the role of regularization as a common umbrella for some of the known online learning methods. While many of the results mentioned here are not novel, we hope to give the reader a fresh perspective through a very natural formulation. We start with the time-varying potential method of Chapter 11.6 of [5], which, we feel, is one of the most general results of the book. The versatility of the method is diminished by the fact that it is hidden in the middle of a chapter on " linear pattern recognition ". In contrast, we would like to bring out this result in a generic setting of convex loss functions and show how various other algorithms arise from this formulation. Another motivation for this note is the realization that the time-varying potential method is nothing more than a sequence of regularized empirical error minimizations. The latter is the basis for most of the batch machine learning methods, such as SVM, Lasso, etc. It is, therefore, very natural to start with an algorithm which minimizes the regularized empirical loss at every step of the online interaction with the environment. This provides a connection between online and batch learning which is conceptually important. We also point the reader to the recent thesis of Shai Shalev-Shwartz [9, 10]. The primal-dual view of online updates is illuminating and leads to new algorithms; however, the focus of these notes is slightly different. Let K ⊆ R n , the set of …