Learning with Missing Features

We introduce new online and batch algorithms that are robust to data with missing features, a situation that arises in many practical applications. In the online setup, we allow for the comparison hypothesis to change as a function of the subset of features that is observed on any given round, extending the standard setting where the comparison hypothesis is fixed throughout. In the batch setup, we present a convex relaxation of a non-convex problem to jointly estimate an imputation function, used to fill in the values of missing features, along with the classification hypothesis. We prove regret bounds in the online setting and Rademacher complexity bounds for the batch i.i.d. setting. The algorithms are tested on several UCI datasets, showing superior performance over baseline imputation methods.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[3]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[4]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[5]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[6]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[7]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[8]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[9]  J. Mitchell,et al.  Semi-infinite linear programming approaches to semidefinite programming problems∗ , 2007 .

[10]  Nimrod Megiddo,et al.  Online Learning with Prior Knowledge , 2007, COLT.

[11]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[12]  Benjamin M. Marlin,et al.  Missing Data Problems in Machine Learning , 2008 .

[13]  Ohad Shamir,et al.  Learning to classify with missing and corrupted features , 2008, ICML '08.

[14]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.

[15]  Ohad Shamir,et al.  Online Learning of Noisy Data with Kernels , 2010, COLT 2010.

[16]  Ohad Shamir,et al.  Efficient Learning with Partially Observed Attributes , 2010, ICML.

[17]  P. Bartlett,et al.  Online and Batch Learning Algorithms for Data with Missing Features , 2011, ArXiv.