Application of Random Forests Methods to Diabetic Retinopathy Classification Analyses

Background Diabetic retinopathy (DR) is one of the leading causes of blindness in the United States and world-wide. DR is a silent disease that may go unnoticed until it is too late for effective treatment. Therefore, early detection could improve the chances of therapeutic interventions that would alleviate its effects. Methodology Graded fundus photography and systemic data from 3443 ACCORD-Eye Study participants were used to estimate Random Forest (RF) and logistic regression classifiers. We studied the impact of sample size on classifier performance and the possibility of using RF generated class conditional probabilities as metrics describing DR risk. RF measures of variable importance are used to detect factors that affect classification performance. Principal Findings Both types of data were informative when discriminating participants with or without DR. RF based models produced much higher classification accuracy than those based on logistic regression. Combining both types of data did not increase accuracy but did increase statistical discrimination of healthy participants who subsequently did or did not have DR events during four years of follow-up. RF variable importance criteria revealed that microaneurysms counts in both eyes seemed to play the most important role in discrimination among the graded fundus variables, while the number of medicines and diabetes duration were the most relevant among the systemic variables. Conclusions and Significance We have introduced RF methods to DR classification analyses based on fundus photography data. In addition, we propose an approach to DR risk assessment based on metrics derived from graded fundus photography and systemic data. Our results suggest that RF methods could be a valuable tool to diagnose DR diagnosis and evaluate its progression.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[3]  Progression of retinopathy with intensive versus conventional treatment in the Diabetes Control and Complications Trial. Diabetes Control and Complications Trial Research Group. , 1995, Ophthalmology.

[4]  R. Holman,et al.  Intensive blood-glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type 2 diabetes (UKPDS 33). UK Prospective Diabetes Study (UKPDS) Group. , 1998 .

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  L. Bouter,et al.  Blood pressure, lipids, and obesity are associated with retinopathy: the hoorn study. , 2002, Diabetes care.

[8]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[11]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[12]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[13]  K. Lunetta,et al.  Identifying SNPs predictive of phenotype using random forests , 2005, Genetic epidemiology.

[14]  W. Ambrosius,et al.  Rationale, design, and methods of the Action to Control Cardiovascular Risk in Diabetes Eye Study (ACCORD-EYE). , 2007, The American journal of cardiology.

[15]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[16]  Michael E. Miller,et al.  Effects of intensive glucose lowering in type 2 diabetes. , 2008, The New England journal of medicine.

[17]  Peter F. Sharp,et al.  Evaluation of a System for Automatic Detection of Diabetic Retinopathy From Color Fundus Photographs in a Large Population of Patients With Diabetes , 2008, Diabetes Care.

[18]  C. Kramer,et al.  [Diabetic retinopathy risk factors]. , 2008, Arquivos brasileiros de endocrinologia e metabologia.

[19]  S. Schinner,et al.  Effects of Intensive Glucose Lowering in Type 2 Diabetes , 2009 .

[20]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[21]  Tshilidzi Marwala,et al.  Missing Data Imputation Through the Use of the Random Forest Algorithm , 2009 .

[22]  D. Ng Human Genetics of Diabetic Retinopathy: Current Perspectives , 2010, Journal of ophthalmology.

[23]  U. Rajendra Acharya,et al.  Algorithms for the Automated Detection of Diabetic Retinopathy Using Digital Fundus Images: A Review , 2012, Journal of Medical Systems.

[24]  John B Buse,et al.  Effects of combination lipid therapy in type 2 diabetes mellitus. , 2010, The New England journal of medicine.

[25]  Meindert Niemeijer,et al.  Automated detection of diabetic retinopathy: barriers to translation into clinical practice , 2010, Expert review of medical devices.

[26]  Walter T Ambrosius,et al.  Effects of medical therapies on retinopathy progression in type 2 diabetes. , 2010, The New England journal of medicine.

[27]  Kevin A Peterson,et al.  Effects of Intensive Blood-Pressure Control in Type 2 Diabetes Mellitus , 2011 .

[28]  Ramon Casanova,et al.  Evaluating the Impact of Different Factors on Voxel-Based Classification Methods of ADNI Structural MRI Brain Images , 2011 .

[29]  Ramon Casanova,et al.  High Dimensional Classification of Structural MRI Alzheimer’s Disease Data Based on Large Scale Regularization , 2011, Front. Neuroinform..

[30]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[31]  Gwénolé Quellec,et al.  Optimal Filter Framework for Automated, Instantaneous Detection of Lesions in Retinal Images , 2011, IEEE Transactions on Medical Imaging.

[32]  U. Rajendra Acharya,et al.  An Integrated Index for the Identification of Diabetic Retinopathy Stages Using Texture Parameters , 2012, Journal of Medical Systems.

[33]  G. Quellec,et al.  Automated assessment of diabetic retinopathy severity using content-based image retrieval in multimodal fundus photographs. , 2011, Investigative ophthalmology & visual science.

[34]  Christos Davatzikos,et al.  Application of machine learning methods to describe the effects of conjugated equine estrogens therapy on region-specific brain volumes. , 2011, Magnetic resonance imaging.

[35]  Gwénolé Quellec,et al.  A multiple-instance learning framework for diabetic retinopathy screening , 2012, Medical Image Anal..

[36]  K. Yaffe,et al.  Cognitive function and retinal and ischemic brain changes , 2012, Neurology.

[37]  R Casanova,et al.  Combining Graph and Machine Learning Methods to Analyze Differences in Functional Connectivity Across Sex , 2012, The open neuroimaging journal.

[38]  B. Klein,et al.  Global Prevalence and Major Risk Factors of Diabetic Retinopathy , 2012, Diabetes Care.

[39]  Ge Li,et al.  Is brain health in the eye of the beholder? , 2012, Neurology.

[40]  M. Özkaya,et al.  Action to Control Cardiovascular Risk in Diabetes , 2013 .

[41]  S. Resnick,et al.  Alzheimer's Disease Risk Assessment Using Large-Scale Machine Learning Methods , 2013, PloS one.

[42]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .