论文信息 - Optimal Algorithms for Ridge and Lasso Regression with Partially Observed Attributes

Optimal Algorithms for Ridge and Lasso Regression with Partially Observed Attributes

We consider the most common variants of linear regression, including Ridge, Lasso and Support-vector regression, in a setting where the learner is allowed to observe only a fixed number of attributes of each example at training time. We present simple and efficient algorithms for these problems: for Lasso and Ridge regression they need the same total number of attributes (up to constants) as do full-information algorithms, for reaching a certain accuracy. For Support-vector regression, we require exponentially less attributes compared to the state of the art. By that, we resolve an open problem recently posed by Cesa-Bianchi et al. (2010). Experiments show the theoretical bounds to be justified by superior performance compared to the state of the art.

Elad Hazan | Tomer Koren

[1] Ohad Shamir,et al. Efficient Learning with Partially Observed Attributes , 2010, ICML.

[2] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[3] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[4] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[5] Shai Ben-David,et al. Learning with Restricted Focus of Attention , 1998, J. Comput. Syst. Sci..

[6] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[9] Po-Ling Loh,et al. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[10] Peter L. Bartlett,et al. Learning with Missing Features , 2011, UAI.

[11] David P. Woodruff,et al. Sublinear Optimization for Machine Learning , 2010, FOCS.

[12] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.