Dynamic Online Learning via Frank-Wolfe Algorithm

Online convex optimization (OCO) encapsulates supervised learning when training sets are large-scale or dynamic, and has grown essential as data has proliferated. OCO decomposes learning into a sequence of sub-problems, each of which must be solved with limited information. To ensure safe model adaption or to avoid overfitting, constraints are often imposed, which are often addressed with projections or Lagrangian relaxation. To avoid this complexity incursion, we propose to study Frank-Wolfe (FW), which operates by updating in collinear directions with the gradient but guaranteed to be feasible. We specifically focus on its use in non-stationary settings, motivated by the fact that its iterates have structured sparsity that may be employed as a distribution-free change-point detector. We establish performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot. Specifically, for convex losses, we establish ${\mathcal O}(T^{1/2})$ dynamic regret up to metrics of non-stationarity. We relax the algorithm's required information to only noisy gradient estimates, i.e., partial feedback. We also consider a mini-batching ‘Meta-Frank Wolfe’, and characterize its dynamic regret. Experiments on matrix completion problem and background separation in video demonstrate favorable performance of the proposed scheme. Moreover, the structured sparsity of FW is experimentally observed to yield the sharpest tracker of change points among alternative approaches to non-stationary online convex optimization.

[1]  Richard M. Murray,et al.  Feedback Systems An Introduction for Scientists and Engineers , 2007 .

[2]  Tong Zhang,et al.  Projection-free Distributed Online Learning in Networks , 2017, ICML.

[3]  John Wright,et al.  Scalable Robust Matrix Recovery: Frank-Wolfe Meets Proximal Methods , 2014, SIAM J. Sci. Comput..

[4]  Ketan Rajawat,et al.  Online Learning With Inexact Proximal Online Gradient Descent Algorithms , 2018, IEEE Transactions on Signal Processing.

[5]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[6]  Zaïd Harchaoui,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..

[7]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[8]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[9]  Aryan Mokhtari,et al.  A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning , 2016, J. Mach. Learn. Res..

[10]  Zuowei Shen,et al.  Robust video denoising using low rank matrix completion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Alp Yurtsever,et al.  Stochastic Frank-Wolfe for Composite Convex Minimization , 2019, NeurIPS.

[12]  Ketan Rajawat,et al.  Adaptive Low-Rank Matrix Completion , 2017, IEEE Transactions on Signal Processing.

[13]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[14]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[15]  Sunav Choudhary,et al.  Active Target Localization using Low-Rank Matrix Completion and Unimodal Regression , 2016, ArXiv.

[16]  Rebecca Willett,et al.  Online Convex Optimization in Dynamic Environments , 2015, IEEE Journal of Selected Topics in Signal Processing.

[17]  Lijun Zhang,et al.  Adaptive Online Learning in Dynamic Environments , 2018, NeurIPS.

[18]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[19]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[20]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[21]  Jinfeng Yi,et al.  Improved Dynamic Regret for Non-degenerate Functions , 2016, NIPS.

[22]  Aryan Mokhtari,et al.  A Class of Prediction-Correction Methods for Time-Varying Convex Optimization , 2015, IEEE Transactions on Signal Processing.

[23]  C. Richard,et al.  Multitask Learning Over Graphs: An Approach for Distributed, Streaming Machine Learning , 2020, IEEE Signal Processing Magazine.

[24]  Eric Moulines,et al.  On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations , 2015, 1510.01171.

[25]  Sean P. Meyn,et al.  Minimax Robust Quickest Change Detection , 2009, IEEE Transactions on Information Theory.

[26]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[27]  Amin Karbasi,et al.  Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[28]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[29]  Aryan Mokhtari,et al.  Parallel Stochastic Successive Convex Approximation Method for Large-Scale Dictionary Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[31]  Nolan Wagener,et al.  An Online Learning Approach to Model Predictive Control , 2019, Robotics: Science and Systems.

[32]  Arindam Banerjee,et al.  Online Alternating Direction Method , 2012, ICML.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Fatih Murat Porikli,et al.  Changedetection.net: A new change detection benchmark dataset , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[35]  Robert E. Schapire,et al.  Algorithms for portfolio management based on the Newton method , 2006, ICML.

[36]  Taposh Banerjee,et al.  Quickest Change Detection , 2012, ArXiv.

[37]  Amin Karbasi,et al.  Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity , 2018, ICML.

[38]  H. Chen,et al.  On Received-Signal-Strength Based Localization with Unknown Transmit Power and Path Loss Exponent , 2012, IEEE Wireless Communications Letters.

[39]  Zebang Shen,et al.  Efficient Projection-Free Online Methods with Stochastic Recursive Gradient , 2019, AAAI.

[40]  Amin Karbasi,et al.  Conditional Gradient Method for Stochastic Submodular Maximization: Closing the Gap , 2017, AISTATS.

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  Shahin Shahrampour,et al.  Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.

[43]  Guanghui Lan The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , 2013, 1309.5550.

[44]  Vikram Krishnamurthy,et al.  Quickest Detection POMDPs With Social Learning: Interaction of Local and Global Decision Makers , 2010, IEEE Transactions on Information Theory.

[45]  Shahin Shahrampour,et al.  Unconstrained Online Optimization: Dynamic Regret Analysis of Strongly Convex and Smooth Problems , 2020, ArXiv.

[46]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[47]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[48]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[49]  Cesare Molinari,et al.  Inexact and Stochastic Generalized Conditional Gradient with Augmented Lagrangian and Proximal Step , 2021, Journal of Nonsmooth Analysis and Optimization.

[50]  Jinfeng Yi,et al.  Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , 2016, ICML.

[51]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[52]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[53]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[54]  DAN GARBER,et al.  A Linearly Convergent Variant of the Conditional Gradient Algorithm under Strong Convexity, with Applications to Online and Stochastic Optimization , 2016, SIAM J. Optim..

[55]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[56]  Paul Grigas,et al.  An Extended Frank-Wolfe Method with "In-Face" Directions, and Its Application to Low-Rank Matrix Completion , 2015, SIAM J. Optim..

[57]  Alejandro Ribeiro,et al.  Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[58]  Ketan Rajawat,et al.  Projection Free Dynamic Online Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[59]  Daniele Calandriello,et al.  Second-Order Kernel Online Convex Optimization with Adaptive Sketching , 2017, ICML.

[60]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[61]  Ketan Rajawat,et al.  Dynamic Network Latency Prediction with Adaptive Matrix Completion , 2018, 2018 International Conference on Signal Processing and Communications (SPCOM).

[62]  Y. Ritov Decision Theoretic Optimality of the Cusum Procedure , 1990 .

[63]  Aryan Mokhtari,et al.  Optimization in Dynamic Environments : Improved Regret Rates for Strongly Convex Problems , 2016 .

[64]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[65]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[66]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..