Subgroup analysis using Bernoulli‐gated hierarchical mixtures of experts models

When it is suspected that the treatment effect may only be strong for certain subpopulations, identifying the baseline covariate profiles of subgroups who benefit from such a treatment is of key importance. In this paper, we propose an approach for subgroup analysis by firstly introducing Bernoulli‐gated hierarchical mixtures of experts (BHME), a binary‐tree structured model to explore heterogeneity of the underlying distribution. We show identifiability of the BHME model and develop an EM‐based maximum likelihood method for optimization. The algorithm automatically determines a partition structure with optimal prediction but possibly suboptimal in identifying treatment effect heterogeneity. We then suggest a testing‐based postscreening step to further capture effect heterogeneity. Simulation results show that our approach outperforms competing methods on discovery of differential treatment effects and other related metrics. We finally apply the proposed approach to a real dataset from the Tennessee's Student/Teacher Achievement Ratio project.

[1]  R. Tibshirani,et al.  The elements of statistical learning: data mining, inference, and prediction, 2nd Edition , 2020 .

[2]  I. Dahabreh,et al.  Causal interaction trees: Finding subgroups with heterogeneous treatment effects in observational data , 2020, Biometrics.

[3]  Faicel Chamroukhi,et al.  Approximation results regarding the multiple-output Gaussian gated mixture of linear experts model , 2019, Neurocomputing.

[4]  Kosuke Imai,et al.  Experimental Evaluation of Individualized Treatment Rules , 2019, Journal of the American Statistical Association.

[5]  Thomas Sikora,et al.  Regularized Gradient Descent Training of Steered Mixture of Experts for Sparse Image Representation , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[6]  Antonio Criminisi,et al.  Adaptive Neural Trees , 2018, ICML.

[7]  P. Müller,et al.  Subgroup finding via Bayesian additive regression trees , 2017, Statistics in medicine.

[8]  I. Lipkovich,et al.  Tutorial in biostatistics: data‐driven subgroup identification and analysis in clinical trials , 2017, Statistics in medicine.

[9]  Cheng Li,et al.  Conditional Bernoulli Mixtures for Multi-label Classification , 2016, ICML.

[10]  Achim Zeileis,et al.  Model-Based Recursive Partitioning for Subgroup Analyses , 2016, The international journal of biostatistics.

[11]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[12]  I. Mechelen,et al.  Quint: An R package for the identification of subgroups of clients who differ in which treatment alternative is best for them , 2015, Behavior Research Methods.

[13]  Graham J. McKee,et al.  Disruption, learning, and the heterogeneous benefits of smaller classes , 2015 .

[14]  W. Loh,et al.  A regression tree approach to identifying subgroups with differential treatment effects , 2014, Statistics in medicine.

[15]  I. van Mechelen,et al.  Qualitative interaction trees: a tool to identify qualitative treatment–subgroup interactions , 2014, Statistics in medicine.

[16]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[17]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[18]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[19]  N. Speybroeck Classification and regression trees , 2012, International Journal of Public Health.

[20]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[21]  I. Lipkovich,et al.  Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations , 2011, Statistics in medicine.

[22]  Steven F. Lehrer,et al.  Experimental estimates of the impacts of class size on test scores: robustness and heterogeneity , 2011 .

[23]  Claudio Conversano,et al.  Combining an Additive and Tree-Based Regression Model Simultaneously: STIMA , 2010 .

[24]  Hansheng Wang,et al.  Subgroup Analysis via Recursive Partitioning , 2009, J. Mach. Learn. Res..

[25]  Chao Yuan,et al.  Variational Mixture of Gaussian Process Experts , 2008, NIPS.

[26]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[27]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[28]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[29]  Kurt Ulm,et al.  Responder identification in clinical trials with censored data , 2006, Comput. Stat. Data Anal..

[30]  Simon Osindero,et al.  An Alternative Infinite Mixture Of Gaussian Process Experts , 2005, NIPS.

[31]  W. Loh,et al.  LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees , 2004 .

[32]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[33]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[34]  Achim Zeileis,et al.  Partykit: a modular toolkit for recursive partytioning in R , 2015, J. Mach. Learn. Res..

[35]  J. Chemali,et al.  Summary and discussion of “ The central role of the propensity score in observational studies for causal effects , 2014 .

[36]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2001, Springer Series in Statistics.