Multinomial Logistic Regression Ensembles

This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model.

[1]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[2]  Michail G. Lagoudakis,et al.  A decision support system to facilitate management of patients with acute gastrointestinal bleeding , 2008, Artif. Intell. Medicine.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  M. Kollef,et al.  BLEED: a classification tool to predict outcomes in patients with acute upper and lower gastrointestinal hemorrhage. , 1997, Critical care medicine.

[5]  A. Agresti Multicategory Logit Models , 2006 .

[6]  H B Devlin,et al.  Incidence of and mortality from acute upper gastrointestinal haemorrhage in the United Kingdom , 1995 .

[7]  Tina Hernandez-Boussard,et al.  Determination of Stromal Signatures in Breast Carcinoma , 2005, PLoS biology.

[8]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Eric R. Ziegel,et al.  Applied Multivariate Data Analysis , 2002, Technometrics.

[12]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[13]  James J. Chen,et al.  Classification by ensembles from random partitions of high-dimensional data , 2007, Comput. Stat. Data Anal..

[14]  G. J. G. Upton,et al.  Applied Multivariate Data Analysis, Volume 1: Regression and Experimental Design , 1994, The Mathematical Gazette.

[15]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[16]  B. Everitt,et al.  Applied Multivariate Data Analysis. , 1993 .

[17]  Dirk Van den Poel,et al.  FACULTEIT ECONOMIE , 2007 .

[18]  Hongshik Ahn,et al.  Classification of High-Dimensional Data with Ensemble of Logistic Regression Models , 2010, Journal of biopharmaceutical statistics.