A Review of Methodological Approaches for Developing Diagnostic Algorithms for Diabetes Screening

Background and Purpose Diagnostic algorithms are invaluable tools for screening diabetes. This review aimed to evaluate and identify the most robust methodological approaches for developing diagnostic algorithms for screening diabetes. Methods Following a literature search, methodological quality of algorithm development studies was evaluated using the TRIPOD guidelines (Collins, Reitsma, Altman, & Moons, 2015). Results Methods used for developing the algorithms included logistic regression models, classification and regression trees, Random Forest and TreeNet, Artificial Neural Networks, and Naïve Bayes. Methodological issues for algorithm development studies were related to handling of missing values, reporting recruitment methods, categorization of continuous variables, and statistical controls. Conclusions Most studies exhibited critical methodological flaws and poor adherence to reporting standards. Diabetes screening algorithms can easily be availed electronically and utilized by nurses at minimal cost even in underserved areas.

[1]  J. Veerman,et al.  Population attributable fraction: names, types and issues with incorrect interpretation of relative risks , 2016, British Journal of Sports Medicine.

[2]  Laura C Rosella,et al.  A population-based risk algorithm for the development of diabetes: development and validation of the Diabetes Population Risk Tool (DPoRT) , 2010, Journal of Epidemiology & Community Health.

[3]  H. Green,et al.  Use of theoretical and conceptual frameworks in qualitative research. , 2014, Nurse researcher.

[4]  E. Steyerberg,et al.  Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research , 2013, PLoS medicine.

[5]  Patrick Dattalo,et al.  A Comparison of Discriminant Analysis and Logistic Regression , 1995 .

[6]  J. Ioannidis,et al.  Strengthening the reporting of genetic risk prediction studies: the GRIPS statement , 2011, Genetics in Medicine.

[7]  E. Steyerberg,et al.  Reporting and Methods in Clinical Prediction Research: A Systematic Review , 2012, PLoS medicine.

[8]  M. Pencina,et al.  On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , 2011, Statistics in medicine.

[9]  B. Tabachnick,et al.  Using multivariate statistics, 5th ed. , 2007 .

[10]  J. Sowers,et al.  Diabetes and cardiovascular disease. , 1999, Diabetes care.

[11]  K. Ananda Kumar,et al.  Neural Networks In Medical And Healthcare , 2013 .

[12]  Ivo D. Dinov,et al.  Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data , 2016, GigaScience.

[13]  Y. Jang,et al.  Standards of Medical Care in Diabetes-2010 by the American Diabetes Association: Prevention and Management of Cardiovascular Disease , 2010 .

[14]  David A. Sontag,et al.  Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors , 2015, Big Data.

[15]  K. V. N. Sunitha,et al.  TreeNet analysis of human stress behavior using socio-mobile data , 2016, Journal of Big Data.

[16]  O. Franco,et al.  Impact of Healthy Lifestyle Factors on Life Expectancies in the US Population , 2018, Circulation.

[17]  Roman Timofeev,et al.  Classification and Regression Trees(CART)Theory and Applications , 2004 .

[18]  T. Ahmed,et al.  Simple risk score to detect rural Asian Indian (Bangladeshi) adults at high risk for type 2 diabetes , 2015, Journal of diabetes investigation.

[19]  Nongyao Nai-arun,et al.  Comparison of Classifiers for the Risk of Diabetes Prediction , 2015 .

[20]  Wei Wang,et al.  The study of statistical methods for evaluating the comparability of routine chemistry analytes among 3 routine laboratory measurement systems in China , 2016, SpringerPlus.

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[22]  J. Stoltzfus,et al.  Logistic regression: a brief primer. , 2011, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[23]  I. Vlahavas,et al.  Machine Learning and Data Mining Methods in Diabetes Research , 2017, Computational and structural biotechnology journal.

[24]  A. Harris,et al.  REporting recommendations for tumour MARKer prognostic studies (REMARK) , 2005, British Journal of Cancer.

[25]  Paige L Williams,et al.  Development of a clinical guideline to predict undiagnosed diabetes in dental patients. , 2011, Journal of the American Dental Association.

[26]  D. Sacks A1C Versus Glucose Testing: A Comparison , 2011, Diabetes Care.

[27]  F. Timmins Nursing Research Generating and Assessing Evidence for Nursing Practice , 2013 .

[28]  T. Nakayama,et al.  Optimal Hemoglobin A1c Levels for Screening of Diabetes and Prediabetes in the Japanese Population , 2015, Journal of diabetes research.

[29]  J. Tu,et al.  Cardiovascular Disease Population Risk Tool (CVDPoRT): predictive algorithm for assessing CVD risk in the community setting. A study protocol , 2014, BMJ Open.

[30]  Y. Skaik Understanding and using sensitivity, specificity and predictive values , 2008, Indian journal of ophthalmology.

[31]  J. Tuomilehto,et al.  The validity of the Finnish Diabetes Risk Score for the prediction of the incidence of coronary heart disease and stroke, and total mortality , 2005, European journal of cardiovascular prevention and rehabilitation : official journal of the European Society of Cardiology, Working Groups on Epidemiology & Prevention and Cardiac Rehabilitation and Exercise Physiology.

[32]  Azuraliza Abu Bakar,et al.  Naïve bayes variants in classification learning , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).

[33]  A. Vickers,et al.  Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents , 2012, BMC Medical Research Methodology.

[34]  J. S. Cramer The Origins of Logistic Regression , 2002 .

[35]  L. Smeeth,et al.  Development and Validation of a Simple Risk Score for Undiagnosed Type 2 Diabetes in a Resource-Constrained Setting , 2016, Journal of diabetes research.

[36]  Yvonne Vergouwe,et al.  Prognosis and prognostic research: what, why, and how? , 2009, BMJ : British Medical Journal.

[37]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[38]  Bo Zhang,et al.  The long-term effect of lifestyle interventions to prevent diabetes in the China Da Qing Diabetes Prevention Study: a 20-year follow-up study , 2008, The Lancet.

[39]  Diana Adler,et al.  Using Multivariate Statistics , 2016 .

[40]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[41]  John R. Clark The Social Science Research Network , 2002 .

[42]  Bruce Ratner,et al.  Variable selection methods in regression: Ignorable problem, outing notable solution , 2010 .

[43]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[44]  Hui Li,et al.  MTPGraph: A Data-Driven Approach to Predict Medical Risk Based on Temporal Profile Graph , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[45]  S. U. Gulumbe,et al.  Identifying the Limitation of Stepwise Selection for Variable Selection in Regression Analysis , 2015 .

[46]  N. Wareham,et al.  Estimating the population impact of screening strategies for identifying and treating people at high risk of cardiovascular disease: modelling study , 2010, BMJ : British Medical Journal.

[47]  I. Stratton,et al.  Development and validation of a Diabetes Risk Score for screening undiagnosed diabetes in Sri Lanka (SLDRISK) , 2016, BMC Endocrine Disorders.

[48]  G. Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement , 2015, Annals of Internal Medicine.

[49]  Xuesong Han,et al.  Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence , 2017, PeerJ.

[50]  J. Zhang,et al.  Long-term effects of a randomised trial of a 6-year lifestyle intervention in impaired glucose tolerance on diabetes-related microvascular complications: the China Da Qing Diabetes Prevention Outcome Study , 2011, Diabetologia.

[51]  K. Khunti,et al.  he development and validation of the Portuguese risk score or detecting type 2 diabetes and impaired fasting glucose aura , 2013 .

[52]  2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2018 , 2017, Diabetes Care.

[53]  N. White,et al.  Tool guide for lifestyle behavior change in a cardiovascular risk reduction program , 2013, Psychology research and behavior management.

[54]  C. Rembold Number needed to screen: development of a statistic for disease screening , 1998, BMJ.

[55]  C. Florkowski Sensitivity, specificity, receiver-operating characteristic (ROC) curves and likelihood ratios: communicating the performance of diagnostic tests. , 2008, The Clinical biochemist. Reviews.