MGLM: An R Package for Multivariate Categorical Data Analysis

Data with multiple responses is ubiquitous in modern applications. However, few tools are available for regression analysis of multivariate counts. The most popular multinomial-logit model has a very restrictive mean-variance structure, limiting its applicability to many data sets. This article introduces an R package MGLM, short for multivariate response generalized linear models, that expands the current tools for regression analysis of polytomous data. Distribution fitting, random number generation, regression, and sparse regression are treated in a unifying framework. The algorithm, usage, and implementation details are discussed.

[1]  S. Baker The Multinomial‐Poisson Transformation , 1994 .

[2]  Hua Zhou,et al.  EM vs MM: A case study , 2012, Comput. Stat. Data Anal..

[3]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[4]  B. Efron,et al.  Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .

[5]  Jin J Zhou,et al.  Regression Models for Multivariate Count Data , 2017, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[6]  Joseph B. Lang,et al.  On the comparison of multinomial and Poisson log-linear models , 1996 .

[7]  Torben Tvedebrink Overdispersion in allelic counts and θ -correction in forensic genetics , 2009 .

[8]  Hua Zhou,et al.  Graphics Processing Units and High-Dimensional Optimization. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Nizar Bouguila,et al.  Clustering of Count Data Using Generalized Dirichlet Multinomial Distributions , 2008, IEEE Transactions on Knowledge and Data Engineering.

[11]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[12]  Robert J. Connor,et al.  Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution , 1969 .

[13]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[14]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[15]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Lexin Li,et al.  Regularized matrix regression , 2012, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  P. Sullivan,et al.  IsoDOT Detects Differential RNA-Isoform Expression/Usage With Respect to a Categorical or Continuous Covariate With High Sensitivity and Specificity , 2014, Journal of the American Statistical Association.

[20]  Thomas W. Yee,et al.  Vector Generalized Linear and Additive Models: With an Implementation in R , 2015 .

[21]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[22]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[23]  K. Lange,et al.  MM Algorithms for Some Discrete Multivariate Distributions , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[24]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[25]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[26]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[27]  Thomas Yee,et al.  VGAM: Vector Generalized Linear and Additive Models 1.0-4 , 2017 .

[28]  A. Agresti Wiley Series in Probability and Statistics , 2002 .

[29]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[30]  T. Banerjee,et al.  Fisher Information Matrix of the Dirichlet‐multinomial Distribution , 2005, Biometrical journal. Biometrische Zeitschrift.