Online Optimization in Dynamic Environments

High-velocity streams of high-dimensional data pose significant "big data" analysis challenges across a range of applications and settings. Online learning and online convex programming play a significant role in the rapid recovery of important or anomalous information from these large datastreams. While recent advances in online learning have led to novel and rapidly converging algorithms, these methods are unable to adapt to nonstationary environments arising in real-world problems. This paper describes a dynamic mirror descent framework which addresses this challenge, yielding low theoretical regret bounds and accurate, adaptive, and computationally efficient algorithms which are applicable to broad classes of problems. The methods are capable of learning and adapting to an underlying and possibly time-varying dynamical model. Empirical results in the context of dynamic texture analysis, solar flare detection, sequential compressed sensing of a dynamic scene, traffic surveillance,tracking self-exciting point processes and network behavior in the Enron email corpus support the core theoretical findings.

[1]  Georgios B. Giannakis,et al.  Online Adaptive Estimation of Sparse Signals: Where RLS Meets the $\ell_1$ -Norm , 2010, IEEE Transactions on Signal Processing.

[2]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[3]  Cosma Rohilla Shalizi,et al.  Adapting to Non-stationarity with Growing Expert Ensembles , 2011, ArXiv.

[4]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[5]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[6]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[7]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[8]  D. Crisan,et al.  Fundamentals of Stochastic Filtering , 2008 .

[9]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[10]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[11]  Seshadhri Comandur,et al.  Efficient learning algorithms for changing environments , 2009, ICML '09.

[12]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[13]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[14]  Lihua Xie,et al.  Robust Kalman filtering for uncertain discrete-time systems , 1994, IEEE Trans. Autom. Control..

[15]  Katherine A. Heller,et al.  Modelling Reciprocating Relationships with Hawkes Processes , 2012, NIPS.

[16]  Aswin C. Sankaranarayanan,et al.  CS-MUVI: Video compressive sensing for spatial-multiplexing cameras , 2012, 2012 IEEE International Conference on Computational Photography (ICCP).

[17]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[18]  D. J. H. Garling,et al.  The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities by J. Michael Steele , 2005, Am. Math. Mon..

[19]  E. Candès,et al.  Compressive fluorescence microscopy for biological and hyperspectral imaging , 2012, Proceedings of the National Academy of Sciences.

[20]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[21]  Rebecca Willett,et al.  Compressive coded aperture video reconstruction , 2008, 2008 16th European Signal Processing Conference.

[22]  Nicolò Cesa-Bianchi,et al.  A new look at shifting regret , 2012, ArXiv.

[23]  Rebecca Willett,et al.  Dynamical Models and tracking regret in online convex programming , 2013, ICML.

[24]  G. Giannakis,et al.  Compressed sensing of time-varying signals , 2009, 2009 16th International Conference on Digital Signal Processing.

[25]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[26]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[27]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[28]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[29]  Rainer Beck,et al.  Square kilometre array , 2010, Scholarpedia.

[30]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31]  Tony Greicius Managing the Deluge of 'Big Data' From Space , 2015 .

[32]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[33]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[34]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[35]  A. Stomakhin,et al.  Reconstruction of missing data in social networks based on temporal patterns of interactions , 2011 .

[36]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[37]  Michael B. Wakin,et al.  A multiscale framework for Compressive Sensing of video , 2009, 2009 Picture Coding Symposium.

[38]  Martin Szummer,et al.  Temporal texture modeling , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[39]  Uri Shaked,et al.  Robust discrete-time minimum-variance filtering , 1996, IEEE Trans. Signal Process..

[40]  Roummel F. Marcia,et al.  Sequential Anomaly Detection in the Presence of Noise and Limited Feedback , 2009, IEEE Transactions on Information Theory.

[41]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[42]  Wouter M. Koolen,et al.  A Closer Look at Adaptive Regret , 2012, J. Mach. Learn. Res..

[43]  Nicolò Cesa-Bianchi,et al.  Mirror Descent Meets Fixed Share (and feels no regret) , 2012, NIPS.

[44]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[45]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[46]  Scott W. Linderman,et al.  Discovering Latent Network Structure in Point Process Data , 2014, ICML.

[47]  H. Brendan McMahan,et al.  A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates , 2010, 1009.3240.

[48]  R. Kass,et al.  Multiple neural spike train data analysis: state-of-the-art and future challenges , 2004, Nature Neuroscience.

[49]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[50]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[51]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[52]  Wei Lu,et al.  Modified-CS: Modifying compressive sensing for problems with partially known support , 2009, 2009 IEEE International Symposium on Information Theory.

[53]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[54]  Tamás Linder,et al.  Efficient Tracking of Large Classes of Experts , 2011, IEEE Transactions on Information Theory.

[55]  Wouter M. Koolen,et al.  Combining Expert Advice Efficiently , 2008, COLT.

[56]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.