论文信息 - Online censoring for large-scale regressions

Online censoring for large-scale regressions

As every day 2.5 quintillion bytes of data are generated, the era of Big Data is undoubtedly upon us. Nonetheless, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with a limited computational budget. In this context, estimating adaptively high-dimensional signals from massive data observed sequentially is challenging but equally important in practice. The present paper deals with this challenge based on a novel approach that leverages interval censoring for data reduction. An online maximum likelihood, least mean-square (LMS)-type algorithm, and an online support vector regression algorithm are developed for censored data. The proposed algorithms entail simple, low-complexity, closed-form updates, and have provably bounded regret. Simulated tests corroborate their efficacy.

[1] Deanna Needell,et al. Stochastic gradient descent and the randomized Kaczmarz algorithm , 2013, ArXiv.

[2] Michael W. Mahoney. Algorithmic and Statistical Perspectives on Large-Scale Data Analysis , 2010, ArXiv.

[3] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.

[4] Gert Cauwenberghs,et al. Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[5] Wei Chu,et al. A Support Vector Approach to Censored Targets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[6] Alejandro Ribeiro,et al. Bandwidth-constrained distributed estimation for wireless sensor Networks-part I: Gaussian case , 2006, IEEE Transactions on Signal Processing.

[7] G. Giannakis,et al. Modeling And Optimization For Big Data Analytics , 2014 .

[8] T. Amemiya. Tobit models: A survey , 1984 .

[9] Ludger Evers,et al. Sparse kernel methods for high-dimensional survival data , 2008, Bioinform..

[10] Michael W. Mahoney. Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[11] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[12] Gang Wang,et al. Power Scheduling of Kalman Filtering in Wireless Sensor Networks with Data Packet Drops , 2013 .

[13] James Theiler,et al. Accurate On-line Support Vector Regression , 2003, Neural Computation.

[14] David L Donoho,et al. Compressed sensing , 2006, IEEE Transactions on Information Theory.

[15] Econo Metrica. REGRESSION ANALYSIS WHEN THE DEPENDENT VARIABLE IS TRUNCATED NORMAL , 2016 .

[16] Morteza Mardani,et al. Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors , 2014, IEEE Transactions on Signal Processing.

[17] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[18] Gonzalo Mateos,et al. Stochastic Approximation vis-a-vis Online Learning for Big Data Analytics [Lecture Notes] , 2014, IEEE Signal Processing Magazine.

[19] Georgios B. Giannakis,et al. Sensor-Centric Data Reduction for Estimation With WSNs via Censoring and Quantization , 2012, IEEE Transactions on Signal Processing.

[20] Gonzalo Mateos,et al. Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[21] Lihua Xie,et al. Asymptotically Optimal Parameter Estimation With Scheduled Measurements , 2013, IEEE Transactions on Signal Processing.

[22] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[23] David R. Cox,et al. Regression models and life tables (with discussion , 1972 .

[24] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[25] Yaniv Plan,et al. One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[26] J. Tobin. Estimation of Relationships for Limited Dependent Variables , 1958 .

[27] Victor Solo,et al. The stability of LMS , 1997, IEEE Trans. Signal Process..

[28] Gonzalo Mateos,et al. Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.