论文信息 - A Fast Trust-Region Newton Method for Softmax Logistic Regression

A Fast Trust-Region Newton Method for Softmax Logistic Regression

With the emergence of big data, there has been a growing interest in optimization routines that lead to faster convergence of Logistic Regression (LR). Among many optimization methods such as Gradient Descent, Quasi-Newton, Conjugate Gradient, etc., the Trust-region based truncated Newton method (TRON) algorithm has been shown to converge the fastest. The TRON algorithm also forms an important component of the highly efficient and widely used liblinear package. It has been shown that the WANBIA-C trick of scaling with the log of the naive Bayes conditional probabilities can greatly accelerate the convergence of LR trained using (first-order) Gradient Descent and (approximate secondorder) Quasi-Newton optimization. In this work we study the applicability of the WANBIA-C trick to TRON. We first devise a TRON algorithm optimizing the softmax objective function and then demonstrate that WANBIA-C style preconditioning can be beneficial for TRON, leading to an extremely fast (batch) LR algorithm. Second, we present a comparative analysis of one-vs-all LR and softmax LR in terms of the 0-1 Loss, Bias, Variance, RMSE, Log-Loss, Training and Classification time, and show that softmax LR leads to significantly better RMSE and Log-Loss. We evaluate our proposed approach on 51 benchmark datasets.

Geoffrey I. Webb | Nayyar A. Zaidi | Nayyar Zaidi

[1] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2] Geoffrey I. Webb,et al. Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression , 2014, 2014 IEEE International Conference on Data Mining.

[3] S. Nash. A survey of truncated-Newton methods , 2000 .

[4] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[5] B. Zadrozny. Reducing multiclass to binary by coupling probability estimates , 2001, NIPS.

[6] Chih-Jen Lin,et al. Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[7] Geoffrey I. Webb,et al. The Need for Low Bias Algorithms in Classification Learning from Large Data Sets , 2002, PKDD.

[8] Andrew W. Moore,et al. Making logistic regression a core data mining tool with TR-IRLS , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9] CodingEun Bae Kong. Probability Estimation via Error-Correcting Output , 1997 .

[10] Yoshua Bengio,et al. Pattern Recognition and Neural Networks , 1995 .

[11] Geoffrey I. Webb,et al. Preconditioning an Artificial Neural Network Using Naive Bayes , 2016, PAKDD.