Poststratification fusion learning in longitudinal data analysis

Stratification is a very commonly used approach in biomedical studies to handle sample heterogeneity arising from, for examples, clinical units, patient subgroups, or missing-data. A key rationale behind such approach is to overcome potential sampling biases in statistical inference. Two issues of such stratification-based strategy are (i) whether individual strata are sufficiently distinctive to warrant stratification, and (ii) sample size attrition resulted from the stratification may potentially lead to loss of statistical power. To address these issues, we propose a penalized generalized estimating equations (GEE) approach to reducing the complexity of parametric model structures due to excessive stratification. Specifically, we develop a data-driven fusion learning approach for longitudinal data that improves estimation efficiency by integrating information across similar strata, yet still allows necessary separation for stratum-specific conclusions. The proposed method is evaluated by simulation studies and applied to a motivating example of psychiatric study to demonstrate its usefulness in real world settings.

[1]  Ananda Sen,et al.  The Theory of Dispersion Models , 1997, Technometrics.

[2]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[3]  Guang Cheng,et al.  Simultaneous Clustering and Estimation of Heterogeneous Graphical Models , 2016, J. Mach. Learn. Res..

[4]  Fei Wang,et al.  Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements , 2016, Biometrics.

[5]  B. Löwe,et al.  A brief measure for assessing generalized anxiety disorder: the GAD-7. , 2006, Archives of internal medicine.

[6]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[7]  Annie Qu,et al.  Penalized Generalized Estimating Equations for High‐Dimensional Longitudinal Data Analysis , 2012, Biometrics.

[8]  R. Spitzer,et al.  The PHQ-9 , 2001, Journal of General Internal Medicine.

[9]  John H Krystal,et al.  A prospective cohort study investigating factors associated with depression during medical internship. , 2010, Archives of general psychiatry.

[10]  J D Dawson,et al.  Stratification of summary statistic tests according to missing data patterns. , 1994, Statistics in medicine.

[11]  Edouard Ollier,et al.  A SAEM algorithm for fused lasso penalized NonLinear Mixed Effect Models: Application to group comparison in pharmacokinetics , 2015, Comput. Stat. Data Anal..

[12]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[13]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[14]  Annie Qu,et al.  Testing ignorable missingness in estimating equation approaches for longitudinal data , 2002 .

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[17]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[18]  Wenjiang J. Fu,et al.  Penalized Estimating Equations , 2003, Biometrics.

[19]  Brent A. Johnson,et al.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models , 2008, Journal of the American Statistical Association.

[20]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[21]  Lu Tang,et al.  Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration , 2016, J. Mach. Learn. Res..

[22]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[23]  Peter J. Diggle,et al.  Testing for random dropouts in repeated measurement data. , 1989 .

[24]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[25]  Roderick J. A. Little,et al.  A test of missing completely at random for generalised estimating equations with missing data , 1999 .

[26]  Xiaotong Shen,et al.  Grouping Pursuit Through a Regularization Solution Surface , 2010, Journal of the American Statistical Association.

[27]  Assessing the validity of weighted generalized estimating equations , 2011 .

[28]  P. X. Song,et al.  Correlated data analysis : modeling, analytics, and applications , 2007 .

[29]  Jian Huang,et al.  A Concave Pairwise Fusion Approach to Subgroup Analysis , 2015, 1508.07045.

[30]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[31]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[32]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[33]  H. Bondell,et al.  Simultaneous Factor Selection and Collapsing Levels in ANOVA , 2009, Biometrics.

[34]  Zehua Chen,et al.  EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM , 2012 .