Learning Combination Features with L1 Regularization

When linear classifiers cannot successfully classify data, we often add combination features, which are products of several original features. The searching for effective combination features, namely feature engineering, requires domain-specific knowledge and hard work. We present herein an efficient algorithm for learning an L1 regularized logistic regression model with combination features. We propose to use the grafting algorithm with efficient computation of gradients. This enables us to find optimal weights efficiently without enumerating all combination features. By using L1 regularization, the result we obtain is very compact and achieves very efficient inference. In experiments with NLP tasks, we show that the proposed method can extract effective combination features, and achieve high performance with very few features.