Logistic regression with total variation regularization

We study logistic regression with total variation penalty on the canonical parameter and show that the resulting estimator satisfies a sharp oracle inequality: the excess risk of the estimator is adaptive to the number of jumps of the underlying signal or an approximation thereof. In particular when there are finitely many jumps, and jumps up are sufficiently separated from jumps down, then the estimator converges with a parametric rate up to a logarithmic term $\log n / n$, provided the tuning parameter is chosen appropriately of order $1/ \sqrt n$. Our results extend earlier results for quadratic loss to logistic loss. We do not assume any a priori known bounds on the canonical parameter but instead only make use of the local curvature of the theoretical risk.

[1]  S. Geer,et al.  Adaptive Rates for Total Variation Image Denoising. , 2020 .

[2]  S. Geer,et al.  Oracle inequalities for image denoising with total variation regularization , 2019, 1911.07231.

[3]  Sara van de Geer,et al.  Prediction bounds for higher order total variation regularized least squares , 2019, The Annals of Statistics.

[4]  Adityanand Guntuboyina,et al.  Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy–Krause variation , 2019, 1903.01395.

[5]  Sabyasachi Chatterjee,et al.  New Risk Bounds for 2D Total Variation Denoising , 2019, IEEE Transactions on Information Theory.

[6]  Cheng Liu,et al.  Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  S. Geer,et al.  On the total variation regularized estimator over a class of tree graphs , 2018, 1806.01009.

[8]  Brenda Betancourt,et al.  Bayesian Fused Lasso Regression for Dynamic Binary Networks , 2017, 1710.01369.

[9]  Donovan Lieu,et al.  Adaptive risk bounds in univariate total variation denoising and trend filtering , 2017, The Annals of Statistics.

[10]  R. Tibshirani,et al.  Additive models with trend filtering , 2017, The Annals of Statistics.

[11]  James G. Scott,et al.  The DFS Fused Lasso: Linear-Time Denoising over General Graphs , 2016, J. Mach. Learn. Res..

[12]  Sara van de Geer,et al.  Estimation and Testing Under Sparsity: École d'Été de Probabilités de Saint-Flour XLV – 2015 , 2016 .

[13]  Yu-Xiang Wang,et al.  Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers , 2016, NIPS.

[14]  P. Rigollet,et al.  Optimal rates for total variation denoising , 2016, 1603.09388.

[15]  Donghyeon Yu,et al.  Classification of spectral data using fused lasso logistic regression , 2015 .

[16]  A. Dalalyan,et al.  On the Prediction Performance of the Lasso , 2014, 1402.1700.

[17]  Sungroh Yoon,et al.  High-Dimensional Fused Lasso Regression Using Majorization–Minimization and Parallel Processing , 2013, 1306.1970.

[18]  R. Tibshirani Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[19]  M.E.Sc. Wieslaw Stepniewski,et al.  The Prediction of Performance , 2013 .

[20]  Shuang Wang,et al.  Penalized logistic regression for high-dimensional DNA methylation data with case-control studies , 2012, Bioinform..

[21]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[22]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[23]  Stephan Didas,et al.  Splines in Higher Order TV Regularization , 2006, International Journal of Computer Vision.

[24]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[25]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[26]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[27]  Yu-Xiang Wang,et al.  Higher-Order Total Variation Classes on Grids: Minimax Theory and Trend Filtering Methods , 2017, NIPS.

[28]  Alessandro Rinaldo,et al.  A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening , 2017, NIPS.

[29]  Jan-Christian Hü,et al.  Optimal rates for total variation denoising , 2016, COLT.