Principal component-based weighted indices and a framework to evaluate indices: Results from the Medical Expenditure Panel Survey 1996 to 2011

Producing indices composed of multiple input variables has been embedded in some data processing and analytical methods. We aim to test the feasibility of creating data-driven indices by aggregating input variables according to principal component analysis (PCA) loadings. To validate the significance of both the theory-based and data-driven indices, we propose principles to review innovative indices. We generated weighted indices with the variables obtained in the first years of the two-year panels in the Medical Expenditure Panel Survey initiated between 1996 and 2011. Variables were weighted according to PCA loadings and summed. The statistical significance and residual deviance of each index to predict mortality in the second years was extracted from the results of discrete-time survival analyses. There were 237,832 surviving the first years of panels, represented 4.5 billion civilians in the United States, of which 0.62% (95% CI = 0.58% to 0.66%) died in the second years of the panels. Of all 134,689 weighted indices, there were 40,803 significantly predicting mortality in the second years with or without the adjustment of age, sex and races. The significant indices in the both models could at most lead to 10,200 years of academic tenure for individual researchers publishing four indices per year or 618.2 years of publishing for journals with annual volume of 66 articles. In conclusion, if aggregating information based on PCA loadings, there can be a large number of significant innovative indices composing input variables of various predictive powers. To justify the large quantities of innovative indices, we propose a reporting and review framework for novel indices based on the objectives to create indices, variable weighting, related outcomes and database characteristics. The indices selected by this framework could lead to a new genre of publications focusing on meaningful aggregation of information.

[1]  Earl R. Babbie,et al.  The practice of social research , 1969 .

[2]  D. Wade,et al.  The Barthel ADL Index: a reliability study. , 1988, International disability studies.

[3]  G. Dunn Complex surveys. , 1996, Statistical methods in medical research.

[4]  M. Gordon,et al.  PUBLICATION RECORDS AND TENURE DECISIONS IN THE FIELD OF STRATEGIC MANAGEMENT , 1996 .

[5]  Marsha P. Johnson Statistical Methods for Health Care Research , 1996 .

[6]  A. K. Taylor,et al.  The Medical Expenditure Panel Survey: a national health information resource. , 1996, Inquiry : a journal of medical care organization, provision and financing.

[7]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[8]  L. Fried,et al.  Frailty in older adults: evidence for a phenotype. , 2001, The journals of gerontology. Series A, Biological sciences and medical sciences.

[9]  L. Fried,et al.  Frailty and activation of the inflammation and coagulation systems with and without clinical comorbidities: results from the Cardiovascular Health Study. , 2002, Archives of internal medicine.

[10]  V. Plerou,et al.  Random matrix approach to cross correlations in financial data. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  S. Fukuda‐Parr THE HUMAN DEVELOPMENT PARADIGM: OPERATIONALIZING SEN'S IDEAS ON CAPABILITIES , 2003 .

[13]  Thomas Lumley,et al.  Analysis of Complex Survey Samples , 2004 .

[14]  A. Mitnitski,et al.  Assessment of Individual Risk of Death Using Self‐Report Data: An Artificial Neural Network Compared with a Frailty Index , 2004, Journal of the American Geriatrics Society.

[15]  I. McDowell,et al.  A global clinical measure of fitness and frailty in elderly people , 2005, Canadian Medical Association Journal.

[16]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[17]  A. Mitnitski,et al.  Frailty in relation to the accumulation of deficits. , 2007, The journals of gerontology. Series A, Biological sciences and medical sciences.

[18]  S. Pocock,et al.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. , 2007, Preventive medicine.

[19]  J. Cauley,et al.  Frailty and risk of falls, fracture, and mortality in older women: the study of osteoporotic fractures. , 2007, The journals of gerontology. Series A, Biological sciences and medical sciences.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  T. Gill,et al.  A standard procedure for creating a frailty index , 2008, BMC geriatrics.

[22]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[23]  R. Hubbard,et al.  Characterising frailty in the clinical setting--a comparison of different approaches. , 2008, Age and ageing.

[24]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[25]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[26]  B. Everitt,et al.  Principal Component Analysis , 2009 .

[27]  Ran He,et al.  Principal component analysis based on non-parametric maximum entropy , 2010, Neurocomputing.

[28]  H. Oja Multivariate Nonparametric Methods with R: An approach based on spatial signs and ranks , 2010 .

[29]  M. Makary,et al.  Frailty as a predictor of surgical outcomes in older patients. , 2010, Journal of the American College of Surgeons.

[30]  D. Moher,et al.  CONSORT 2010 Statement: updated guidelines for reporting parallel group randomized trials , 2010, Obstetrics and gynecology.

[31]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[32]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[33]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[34]  Melinda Mills,et al.  Introducing Survival and Event History Analysis , 2011 .

[35]  I. Kawachi,et al.  Transition to retirement and risk of cardiovascular disease: prospective analysis of the US health and retirement study. , 2012, Social science & medicine.

[36]  M. Drazner,et al.  2013 ACCF/AHA guideline for the management of heart failure: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines. , 2013, Journal of the American College of Cardiology.

[37]  Gerardo L. Munck,et al.  Cross-National Indices with Gender-Differentiated Data: What Do They Measure? How Valid Are They? , 2013 .

[38]  Jennifer G. Robinson,et al.  2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines , 2014, Circulation.

[39]  K. Rockwood,et al.  A Frailty Index predicts 10-year fracture risk in adults age 25 years and older: results from the Canadian Multicentre Osteoporosis Study (CaMos) , 2014, Osteoporosis International.

[40]  K. Lee,et al.  Korean translation of the CONSORT 2010 Statement: updated guidelines for reporting parallel group randomized trials , 2014, Epidemiology and health.

[41]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[42]  C. Hass,et al.  The cognition and emotional well-being indices of the Parkinson's disease questionnaire-39: what do they really measure? , 2014, Parkinsonism & related disorders.

[43]  Jennifer G. Robinson,et al.  2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines , 2014, Circulation.

[44]  H. Makizako,et al.  Impact of physical frailty on disability in community-dwelling older adults: a prospective cohort study , 2015, BMJ Open.

[45]  Mark Ware,et al.  The STM report: An overview of scientific and scholarly journal publishing fourth edition , 2015 .

[46]  H. Cohen,et al.  Cancer and Aging , 2016 .

[47]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .

[48]  Joel E. Oestreich UNITED NATIONS DEVELOPMENT PROGRAMME , 2000 .