Quickest convergence of online algorithms via data selection

Big data applications demand efficient solvers capable of providing accurate solutions to large-scale problems at affordable computational costs. Processing data sequentially, online algorithms offer attractive means to deal with massive data sets. However, they may incur prohibitive complexity in high-dimensional scenarios if the entire data set is processed. It is therefore necessary to confine computations to an informative subset. While existing approaches have focused on selecting a prescribed fraction of the available data vectors, the present paper capitalizes on this degree of freedom to accelerate the convergence of a generic class of online algorithms in terms of processing time/computational resources by balancing the required burden with a metric of how informative each datum is. The proposed method is illustrated in a linear regression setting, and simulations corroborate the superior convergence rate of the recursive least-squares algorithm when the novel data selection is effected.

[1]  Lihua Xie,et al.  Asymptotically Optimal Parameter Estimation With Scheduled Measurements , 2013, IEEE Transactions on Signal Processing.

[2]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[3]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[4]  Pramod K. Varshney,et al.  Sequential Bayesian Estimation With Censored Data for Multi-Sensor Systems , 2014, IEEE Transactions on Signal Processing.

[5]  Gang Wang,et al.  Adaptive censoring for large-scale regressions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Haris Vikalo,et al.  Greedy sensor selection: Leveraging submodularity , 2010, 49th IEEE Conference on Decision and Control (CDC).

[7]  Tzay Y. Young,et al.  Classification, Estimation and Pattern Recognition , 1974 .

[8]  Giorgio Battistelli,et al.  Data-driven strategies for selective data transmission in sensor networks , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[9]  Deanna Needell,et al.  Stochastic gradient descent and the randomized Kaczmarz algorithm , 2013, ArXiv.

[10]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[11]  Yue M. Lu,et al.  Randomized Kaczmarz algorithms: Exact MSE analysis and optimal sampling probabilities , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[12]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[13]  Stephen P. Boyd,et al.  Sensor Selection via Convex Optimization , 2009, IEEE Transactions on Signal Processing.

[14]  Georgios B. Giannakis,et al.  Online Censoring for Large-Scale Regressions with Application to Streaming Big Data , 2015, IEEE Transactions on Signal Processing.

[15]  Gonzalo Mateos,et al.  Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.