PIANO: A Fast Parallel Iterative Algorithm for Multinomial and Sparse Multinomial Logistic Regression

Multinomial Logistic Regression is a well-studied tool for classification and has been widely used in fields like image processing, computer vision and, bioinformatics, to name a few. Under a supervised classification scenario, a Multinomial Logistic Regression model learns a weight vector to differentiate between any two classes by optimizing over the likelihood objective. With the advent of big data, the inundation of data has resulted in large dimensional weight vector and has also given rise to a huge number of classes, which makes the classical methods applicable for model estimation not computationally viable. To handle this issue, we here propose a parallel iterative algorithm: Parallel Iterative Algorithm for MultiNomial LOgistic Regression (PIANO) which is based on the Majorization Minimization procedure, and can parallely update each element of the weight vectors. Further, we also show that PIANO can be easily extended to solve the Sparse Multinomial Logistic Regression problem - an extensively studied problem because of its attractive feature selection property. In particular, we work out the extension of PIANO to solve the Sparse Multinomial Logistic Regression problem with l1 and l0 regularizations. We also prove that PIANO converges to a stationary point of the Multinomial and the Sparse Multinomial Logistic Regression problems. Simulations were conducted to compare PIANO with the existing methods, and it was found that the proposed algorithm performs better than the existing methods in terms of speed of convergence.

[1]  Le Thi Hoai An,et al.  Sparse semi-supervised support vector machines by DC programming and DCA , 2015, Neurocomputing.

[2]  D. O’Leary Robust regression computation computation using iteratively reweighted least squares , 1990 .

[3]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[4]  Georgios Paliouras,et al.  LSHTC: A Benchmark for Large-Scale Text Classification , 2015, ArXiv.

[5]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[6]  Yiming Yang,et al.  Distributed training of Large-scale Logistic models , 2013, ICML.

[7]  J. Jensen Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .

[8]  Gerhard Weikum,et al.  Fast logistic regression for text categorization with variable-length n-grams , 2008, KDD.

[9]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[10]  Lawrence Carin,et al.  Hidden Markov models for multiaspect target classification , 1999, IEEE Trans. Signal Process..

[11]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[12]  Prabhu Babu,et al.  Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning , 2017, IEEE Transactions on Signal Processing.

[13]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[16]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[17]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Dianne P. O’LEARYt ROBUST REGRESSION COMPUTATION USING ITERATIVELY REWEIGHTED LEAST SQUARES * , 2022 .

[19]  Francisco Facchinei,et al.  Parallel Selective Algorithms for Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[22]  Le Thi Hoai An,et al.  A DC programming approach for feature selection in support vector machines learning , 2008, Adv. Data Anal. Classif..

[23]  Antonio J. Plaza,et al.  Semisupervised Hyperspectral Image Segmentation Using Multinomial Logistic Regression With Active Learning , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[25]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[26]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[29]  Antonio J. Plaza,et al.  Semisupervised Hyperspectral Image Classification Using Soft Sparse Multinomial Logistic Regression , 2013, IEEE Geoscience and Remote Sensing Letters.