Revisit of Logistic Regression : Efficient Optimization and Kernel Extensions

Logistic regression (LR) is widely applied as a powerful classification method in various fields, and a variety of optimization methods have been developed. To cope with large-scale problems, an efficient optimization method for LR is required in terms of computational cost and memory usage. In this paper, we propose an efficient optimization method using non-linear conjugate gradient (CG) descent. In each CG iteration, the proposed method employs the optimized step size without exhaustive line search, which significantly reduces the number of iterations, making the whole optimization process fast. In addition, on the basis of such CG-based optimization scheme, a novel optimization method for kernel logistic regression (KLR) is proposed. Unlike the ordinary KLR methods, the proposed method optimizes the kernel-based classifier, which is naturally formulated as the linear combination of sample kernel functions, directly in the reproducing kernel Hilbert space (RKHS), not the linear coefficients. Subsequently, we also propose the multiple-kernel logistic regression (MKLR) along with the optimization of KLR. The MKLR effectively combines the multiple types of kernels with optimizing the weights for the kernels in the framework of the logistic regression. These proposed methods are all based on CG-based optimization and matrix-matrix computation which is easily parallelized such as by using multi-thread programming. In the experimental results on multi-class classifications using various datasets, the proposed methods exhibit favorable performances in terms of classification accuracies and computation times.

[1]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[2]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[4]  Andrew W. Moore,et al.  Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs , 2003, AISTATS.

[5]  Cheng Soon Ong,et al.  An Automated Combination of Kernels for Predicting Protein Subcellular Localization , 2007, WABI.

[6]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[10]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Kenji Fukumizu,et al.  Local minima and plateaus in hierarchical structures of multilayer perceptrons , 2000, Neural Networks.

[14]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[15]  L. Liao,et al.  New Conjugacy Conditions and Related Nonlinear Conjugate Gradient Methods , 2001 .

[16]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[19]  Derek Hoiem,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[21]  Jung-Ying Wang,et al.  Application of Support Vector Machines in Bioinformatics , 2002 .

[22]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[23]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[24]  W. Hager,et al.  A SURVEY OF NONLINEAR CONJUGATE GRADIENT METHODS , 2005 .

[25]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Ethem Alpaydin,et al.  Combining multiple representations and classifiers for pen-based handwritten digit recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[27]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[28]  G. Wahba,et al.  Soft Classiication, A. K. A. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Analysis of Variance , 1993 .

[29]  Andrew W. Moore,et al.  Making logistic regression a core data mining tool with TR-IRLS , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[30]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[31]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[32]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[33]  Hal Daumé Notes on CG and LM-BFGS Optimization of Logistic Regression , 2008 .

[34]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[35]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[36]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[37]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[38]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[39]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .