A novel robust approach for analysis of longitudinal data

Abstract A new robust estimating equation approach for analysis of longitudinal data is developed. To achieve robustness against outliers, a novel approach which corrects the bias induced by outliers through centralizing the covariate matrix in the estimating equation is proposed. The covariates are centralized by subtracting their conditional expectations and the conditional expectations can be estimated by using the local linear smoothing method. The consistency and asymptotic normality of the proposed estimator are established under some regularity conditions. Extensive simulation studies show that the proposed method is robust, has a high efficiency, and is not limited to some specific error distributions. In the end, the proposed method is applied to the longitudinal study of prevalent patients with type 2 diabetes and confirms the effectiveness of dietary fibre intake in reducing glycolated hemoglobin A1c level.

[1]  S. P. Pederson,et al.  On Robustness in the Logistic Regression Model , 1993 .

[2]  J S Preisser,et al.  Robust Regression for Clustered Data with Application to Binary Responses , 1999, Biometrics.

[3]  Chenlei Leng,et al.  A joint modelling approach for longitudinal studies , 2015 .

[4]  Zhongyi Zhu,et al.  Robust Estimation in Generalized Partial Linear Models for Clustered Data , 2005 .

[5]  Yong Zhou,et al.  Efficient Quantile Regression Analysis With Missing Observations , 2015 .

[6]  Yichao Wu,et al.  FULLY EFFICIENT ROBUST ESTIMATION, OUTLIER DETECTION AND VARIABLE SELECTION VIA PENALIZED REGRESSION , 2018 .

[7]  E. Ronchetti,et al.  Robust Inference for Generalized Linear Models , 2001 .

[8]  Takashi Funatogawa,et al.  Longitudinal Data and Linear Mixed Effects Models , 2018 .

[9]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[10]  Liping Zhu,et al.  Doubly robust and efficient estimators for heteroscedastic partially linear single‐index models allowing high dimensional covariates , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[11]  Junyi Jiang,et al.  Heterogeneous associations of insoluble dietary fibre intake with subsequent glycosylated Hb levels among Chinese adults with type 2 diabetes: a quantile regression approach , 2014, British Journal of Nutrition.

[12]  V. Yohai,et al.  Asymptotic behavior of general M-estimates for regression and scale with random carriers , 1981 .

[13]  Jing Lv,et al.  An efficient and robust variable selection method for longitudinal generalized linear models , 2015, Comput. Stat. Data Anal..

[14]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[15]  Guoyou Qin,et al.  Robust estimation of partially linear models for longitudinal data with dropouts and measurement error , 2016, Statistics in medicine.

[16]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[17]  Zhongyi Zhu,et al.  Robust estimation of covariance parameters in partial linear model for longitudinal data , 2009 .

[18]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[19]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[20]  Hua Liang,et al.  Semiparametric GEE analysis in partially linear single-index models for longitudinal data , 2015, 1507.08473.

[21]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[22]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[23]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[24]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[25]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[26]  Lixing Zhu,et al.  Empirical likelihood inference in partially linear single-index models for longitudinal data , 2010, J. Multivar. Anal..

[27]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[28]  Jeng-Min Chiou,et al.  Estimated estimating equations: semiparametric inference for clustered and longitudinal data , 2005 .

[29]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[30]  Roy E. Welsch,et al.  Robust variable selection using least angle regression and elemental set sampling , 2007, Comput. Stat. Data Anal..

[31]  Hongyan Wu,et al.  Dietary Fiber Intake Is Associated with HbA1c Level among Prevalent Patients with Type 2 Diabetes in Pudong New Area of Shanghai, China , 2012, PloS one.

[32]  B. Qaqish,et al.  Deletion diagnostics for generalised estimating equations , 1996 .

[33]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[34]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[35]  Genming Zhao,et al.  Long-term effect of dietary fibre intake on glycosylated haemoglobin A1c level and glycaemic control status among Chinese patients with type 2 diabetes mellitus , 2013, Public Health Nutrition.

[36]  Sanjoy K. Sinha,et al.  Robust Analysis of Generalized Linear Mixed Models , 2004 .

[37]  Sanjoy K. Sinha,et al.  Robust inference in generalized linear models for longitudinal data , 2006 .

[38]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .