Online Censoring for Large-Scale Regressions with Application to Streaming Big Data

On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first- and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.

[1]  Gonzalo Mateos,et al.  Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.

[2]  Dean P. Foster,et al.  Faster Ridge Regression via the Subsampled Randomized Hadamard Transform , 2013, NIPS.

[3]  Dimitri P. Bertsekas,et al.  Convex Optimization Algorithms , 2015 .

[4]  Lihua Xie,et al.  Asymptotically Optimal Parameter Estimation With Scheduled Measurements , 2013, IEEE Transactions on Signal Processing.

[5]  Gonzalo Mateos,et al.  Stochastic Approximation vis-a-vis Online Learning for Big Data Analytics [Lecture Notes] , 2014, IEEE Signal Processing Magazine.

[6]  Xin-She Yang Optimization Algorithms , 2011, Computational Optimization, Methods and Algorithms.

[7]  Yue M. Lu,et al.  Randomized Kaczmarz algorithms: Exact MSE analysis and optimal sampling probabilities , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[8]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[9]  J. Tobin Estimation of Relationships for Limited Dependent Variables , 1958 .

[10]  Georgios B. Giannakis,et al.  Sensor-Centric Data Reduction for Estimation With WSNs via Censoring and Quantization , 2012, IEEE Transactions on Signal Processing.

[11]  Alejandro Ribeiro,et al.  Bandwidth-constrained distributed estimation for wireless sensor Networks-part I: Gaussian case , 2006, IEEE Transactions on Signal Processing.

[12]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[13]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[14]  Shirish Nagaraj,et al.  Set-membership filtering and a set-membership normalized LMS algorithm with an adaptive step size , 1998, IEEE Signal Processing Letters.

[15]  Ludger Evers,et al.  Sparse kernel methods for high-dimensional survival data , 2008, Bioinform..

[16]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[17]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[18]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[19]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[20]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[21]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[22]  Gang Wang,et al.  Power Scheduling of Kalman Filtering in Wireless Sensor Networks with Data Packet Drops , 2013 .

[24]  Dean P. Foster,et al.  Fast Ridge Regression with Randomized Principal Component Analysis and Gradient Descent , 2014, UAI.

[25]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[26]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[27]  J. S. Meditch,et al.  Estimation Theory , 1977, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[28]  Christos Boutsidis,et al.  Random Projections for the Nonnegative Least-Squares Problem , 2008, ArXiv.

[29]  Deanna Needell,et al.  Stochastic gradient descent and the randomized Kaczmarz algorithm , 2013, ArXiv.

[30]  Michael W. Mahoney Algorithmic and Statistical Perspectives on Large-Scale Data Analysis , 2010, ArXiv.

[31]  Yaniv Plan,et al.  One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[32]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[33]  Michael W. Mahoney,et al.  A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares , 2014, J. Mach. Learn. Res..

[34]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[35]  T. Amemiya Tobit models: A survey , 1984 .

[36]  D. Bertsekas,et al.  Recursive state estimation for a set-membership description of uncertainty , 1971 .

[37]  Tzay Y. Young,et al.  Classification, Estimation and Pattern Recognition , 1974 .

[38]  Econo Metrica REGRESSION ANALYSIS WHEN THE DEPENDENT VARIABLE IS TRUNCATED NORMAL , 2016 .

[39]  Martin J. Wainwright,et al.  Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares , 2014, J. Mach. Learn. Res..

[40]  Michael Jackson,et al.  Optimal Design of Experiments , 1994 .

[41]  Geert Leus,et al.  Censored truncated sequential spectrum sensing for cognitive radio networks , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[42]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[43]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[44]  J. I The Design of Experiments , 1936, Nature.