Analyzing evidence-based falls prevention data with significant missing information using variable selection after multiple imputation

Falls are the leading cause of fatal and non-fatal injuries among older adults. Evidence-based fall prevention programs are delivered nationwide, largely supported by funding from the Administration for Community Living (ACL), to mitigate fall-related risk. This study utilizes data from 39 ACL grantees in 22 states from 2014 to 2017. The large amount of missing values for falls efficacy in this national database may lead to potentially biased statistical results and make it challenging to implement reliable variable selection. Multiple imputation is used to deal with missing values. To obtain a consistent result of variable selection in multiply-imputed datasets, multiple imputation-stepwise regression (MI-stepwise) and multiple imputation-least absolute shrinkage and selection operator (MI-LASSO) methods are used. To compare the performances of MI-stepwise and MI-LASSO, simulation studies were conducted. In particular, we extended prior work by considering several circumstances not covered in previous studies, including an extensive investigation of data with different signal-to-noise ratios and various missing data patterns across predictors, as well as a data structure that allowed the missingness mechanism to be missing not at random (MNAR). In addition, we evaluated the performance of MI-LASSO method with varying tuning parameters to address the overselection issue in cross-validation (CV)-based LASSO.

[1]  Samuel D. Towne,et al.  Delivery of Fall Prevention Interventions for At-Risk Older Adults in Rural Areas: Findings from a National Dissemination , 2018, International journal of environmental research and public health.

[2]  G. Bergen,et al.  Falls and Fall Injuries Among Adults Aged ≥65 Years - United States, 2014. , 2016, MMWR. Morbidity and mortality weekly report.

[3]  V. Chernozhukov,et al.  On cross-validated Lasso in high dimensions , 2016, The Annals of Statistics.

[4]  Ashley D. Wilson,et al.  Fall Prevention in Community Settings: Results from Implementing Tai Chi: Moving for Better Balance in Three States , 2015, Front. Public Health.

[5]  Ashley D. Wilson,et al.  Fall Prevention in Community Settings: Results from Implementing Stepping On in Three States , 2015, Front. Public Health.

[6]  Matthew Lee Smith,et al.  Setting the Stage: Measure Selection, Coordination, and Data Collection for a National Self-Management Initiative , 2015, Front. Public Health.

[7]  M. Ory,et al.  Effects of an Evidence-Based Falls Risk-Reduction Program on Physical Activity and Falls Efficacy among Oldest-Old Adults , 2015, Front. Public Health.

[8]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[9]  Fuzhong Li Transforming traditional Tai Ji Quan techniques into integrative movement therapy-Tai Ji Quan: Moving for Better Balance. , 2014, Journal of sport and health science.

[10]  Sijian Wang,et al.  Variable selection for multiply‐imputed data with application to dioxin exposure study , 2013, Statistics in medicine.

[11]  M. Ory,et al.  Falls Efficacy Among Older Adults Enrolled in an Evidence-Based Program to Reduce Fall-Related Risk: Sustainability of Individual Benefits Over Time , 2012, Family & community health.

[12]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[13]  I. White,et al.  How should variable selection be performed with multiply imputed data? , 2008, Statistics in medicine.

[14]  Nicholas L. Crookston,et al.  yaImpute: An R Package for kNN Imputation , 2008 .

[15]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[16]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[17]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[18]  Lindy Clemson,et al.  The Effectiveness of a Community‐Based Program for Reducing the Incidence of Falls in the Elderly: A Randomized Trial , 2004, Journal of the American Geriatrics Society.

[19]  N. Lazar,et al.  Methods and Criteria for Model Selection , 2004 .

[20]  B. Munoz,et al.  Falls and Fear of Falling: Which Comes First? A Longitudinal Prediction Model Suggests Strategies for Primary and Secondary Prevention , 2002, Journal of the American Geriatrics Society.

[21]  H. Stern,et al.  The use of multiple imputation for the analysis of missing data. , 2001, Psychological methods.

[22]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[23]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[24]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[25]  Christianna S. Williams,et al.  The effect of falls and fall injuries on functioning in community-dwelling older persons. , 1998, The journals of gerontology. Series A, Biological sciences and medical sciences.

[26]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[27]  I. Ruoppila,et al.  Physical activity and psychological well-being among people aged 65 to 84 years. , 1995, Age and ageing.

[28]  J G Rodriguez,et al.  The incidence of fall injury events among the elderly in a defined population. , 1990, American journal of epidemiology.

[29]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[30]  Ho Yin Ho,et al.  The Lasso and its model selection criteria , 2014 .

[31]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[32]  D. Knol,et al.  Bmc Medical Research Methodology Open Access Variable Selection under Multiple Imputation Using the Bootstrap in a Prognostic Study , 2007 .

[33]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[34]  Yang C. Yuan,et al.  Multiple Imputation for Missing Data: Concepts and New Development , 2000 .

[35]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .