Supervised dimension reduction for ordinal predictors

In applications involving ordinal predictors, common approaches to reduce dimensionality are either extensions of unsupervised techniques such as principal component analysis, or variable selection procedures that rely on modeling the regression function. In this paper, a supervised dimension reduction method tailored to ordered categorical predictors is introduced. It uses a model-based dimension reduction approach, inspired by extending sufficient dimension reductions to the context of latent Gaussian variables. The reduction is chosen without modeling the response as a function of the predictors and does not impose any distributional assumption on the response or on the response given the predictors. A likelihood-based estimator of the reduction is derived and an iterative expectation-maximization type algorithm is proposed to alleviate the computational load and thus make the method more practical. A regularized estimator, which simultaneously achieves variable selection and dimension reduction, is also presented. Performance of the proposed method is evaluated through simulations and a real data example for socioeconomic index construction, comparing favorably to widespread use techniques.

[1]  R. Cook,et al.  Optimal sufficient dimension reduction in regressions with categorical predictors , 2002 .

[2]  R. Cook,et al.  Dimension Reduction in Binary Response Regression , 1999 .

[3]  Z. Mokomane Social protection as a mechanism for family protection in sub‐Saharan Africa , 2013 .

[4]  Simon Jackman,et al.  Wiley Series in Probability and Statistics , 2009 .

[5]  R. Cook,et al.  Dimension reduction for conditional mean in regression , 2002 .

[6]  Barnabás Póczos,et al.  Scale Invariant Conditional Dependence Measures , 2013, ICML.

[7]  R. Cook,et al.  Principal fitted components for dimension reduction in regression , 2008, 0906.3953.

[8]  R. Christensen,et al.  Fisher Lecture: Dimension Reduction in Regression , 2007, 0708.3774.

[9]  R. Cook Graphics for regressions with a binary response , 1996 .

[10]  S. Feeny,et al.  Are Poor People Less Happy? Findings from Melanesia , 2014 .

[11]  Mariëlle Linting,et al.  Nonlinear Principal Components Analysis With CATPCA: A Tutorial , 2012, Journal of personality assessment.

[12]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[13]  R. Cook,et al.  Sufficient Dimension Reduction via Inverse Regression , 2005 .

[14]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[15]  Jennifer L. Glanville,et al.  SOCIOECONOMIC STATUS AND CLASS IN STUDIES OF FERTILITY AND HEALTH IN DEVELOPING COUNTRIES , 2001 .

[16]  R. Cook,et al.  Likelihood-Based Sufficient Dimension Reduction , 2009 .

[17]  J. Murasko,et al.  Socioeconomic status, height, and obesity in children. , 2009, Economics and human biology.

[18]  R. Dennis Cook,et al.  Optimal sufficient dimension reduction in regressions with categorical predictors , 2002 .

[19]  Calyampudi R. Rao,et al.  Linear statistical inference and its applications , 1965 .

[20]  Wagner A. Kamakura,et al.  Socioeconomic Status and Consumption in an Emerging Economy , 2013 .

[21]  Gyemin Lee,et al.  EM algorithms for multivariate Gaussian mixture models with truncated and censored data , 2012, Comput. Stat. Data Anal..

[22]  H. Tong,et al.  An adaptive estimation of dimension reduction , 2002 .

[23]  Bing Li,et al.  Successive direction extraction for estimating the central subspace in a multiple-index regression , 2008 .

[24]  Kakoli Roy,et al.  Influence of socioeconomic status, wealth and financial empowerment on gender differences in health and healthcare utilization in later life: evidence from India. , 2008, Social science & medicine.

[25]  L. Corrado Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models , 2005 .

[26]  R. Cook,et al.  Estimating the structural dimension of regressions via parametric inverse regression , 2001 .

[27]  J. A. Calvin Regression Models for Categorical and Limited Dependent Variables , 1998 .

[28]  Fernando Ortega,et al.  Generalization of recommender systems: Collaborative filtering extended to groups of users and restricted to groups of items , 2012, Expert Syst. Appl..

[29]  G. Tutz,et al.  Sparse Modeling of Categorial Explanatory Variables , 2011, 1101.1421.

[30]  R. Cook,et al.  Sufficient dimension reduction and prediction in regression , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[31]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[32]  George Michailidis,et al.  Graphical Models for Ordinal Data , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[33]  S. Kolenikov,et al.  Socioeconomic Status Measurement with Discrete Proxy Variables: Is Principal Component Analysis a Reliable Answer? , 2009 .

[34]  Fabrizio Mazzonna The Long-Lasting Effects of Family Background: A European Cross-Country Comparison , 2011 .

[35]  S. Hoque Asset-based poverty analysis in rural Bangladesh : A comparison of principal component analysis and fuzzy set theory © , 2014 .

[36]  R. Cook,et al.  Coordinate-independent sparse sufficient dimension reduction and variable selection , 2010, 1211.3215.

[37]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[38]  W. J. J. Roberts Factor analysis parameter estimation from incomplete data , 2014, Comput. Stat. Data Anal..

[39]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[40]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[41]  S. Doocy,et al.  Assessment of socio-economic status in the context of food insecurity: Implications for field research. , 2006, World health & population.

[42]  Yu Zhu,et al.  Fourier Methods for Estimating the Central Subspace and the Central Mean Subspace in Regression , 2006 .

[43]  H. Zha,et al.  Contour regression: A general approach to dimension reduction , 2005, math/0508277.

[44]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[45]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[46]  Efstathia Bura,et al.  Sufficient Reductions in Regressions With Exponential Family Inverse Predictors , 2016 .

[47]  Lilani Kumaranayake,et al.  Constructing socio-economic status indices: how to use principal components analysis. , 2006, Health policy and planning.

[48]  W. Greene,et al.  Modeling Ordered Choices: A Primer , 2010 .