Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model.

BACKGROUND This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran. METHODS This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity. RESULT Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%). CONCLUSION According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  R. Malekzadeh,et al.  The trend of national and sub-national burden of gastrointestinal and liver diseases in Iran 1990 to 2013; study protocol. , 2014, Archives of Iranian medicine.

[3]  S. A. R. A A L L I N Does Equity in Healthcare Use Vary across Canadian Provinces? L’équité dans l’utilisation des services de santé varie-t-elle entre les provinces canadiennes? , 2008 .

[4]  Christine Connors,et al.  Estimating chronic disease prevalence among the remote Aboriginal population of the Northern Territory using multiple data sources , 2008, Australian and New Zealand journal of public health.

[5]  M. Liang,et al.  Sensitivity and positive predictive value of Medicare Part B physician claims for rheumatologic diagnoses and procedures. , 1997, Arthritis and rheumatism.

[6]  L L Roos,et al.  Estimating the burden of disease. Comparing administrative data and self-reports. , 1997, Medical care.

[7]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[8]  Sharareh R. Niakan Kalhori,et al.  Inpatient data, inevitable need for policy making at national and sub-national levels: a lesson learned from NASBOD. , 2014, Archives of Iranian medicine.

[9]  R M Worth,et al.  Medical insurance claims as a source of data for research: accuracy of diagnostic coding. , 1996, Hawaii medical journal.

[10]  M. Parsaeian,et al.  Comparison of Logistic Regression and Artificial Neural Network in Low Back Pain Prediction: Second National Health Survey , 2012, Iranian journal of public health.

[11]  E. Vermeire,et al.  Patient adherence to treatment: three decades of research. A comprehensive review , 2001, Journal of clinical pharmacy and therapeutics.

[12]  D. Louis,et al.  Using pharmacy data to identify those with chronic conditions in Emilia Romagna, Italy , 2005, Journal of health services research & policy.

[13]  Vili Podgorelec,et al.  Decision Trees: An Overview and Their Use in Medicine , 2002, Journal of Medical Systems.

[14]  Younes Mohammadi,et al.  Levels and trends of child and adult mortality rates in the Islamic Republic of Iran, 1990-2013; protocol of the NASBOD study. , 2014, Archives of Iranian medicine.

[15]  D. Gelskey,et al.  Comparison of survey and physician claims data for detecting hypertension. , 1997, Journal of clinical epidemiology.

[16]  J. Wilkinson,et al.  Development and importance of health needs assessment , 1998, BMJ.

[17]  S. Agatonovic-Kustrin,et al.  Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. , 2000, Journal of pharmaceutical and biomedical analysis.

[18]  Farshad Farzadfar,et al.  Quality evaluation of national cancer registry system in Iran: study protocol. , 2014, Archives of Iranian medicine.

[19]  J. Carstensen,et al.  Estimating disease prevalence using a population-based administrative healthcare database , 2007, Scandinavian journal of public health.

[20]  B. Milović,et al.  Prediction and Decision Making in Health Care using Data Mining , 2012 .

[21]  R. Westerling,et al.  Measures of prevalence: which healthcare registers are applicable? , 2001, Scandinavian journal of public health.

[22]  W. Batchelor,et al.  Review and comparison of methods to measure paper fracture energy , 2006 .

[23]  R. Tamblyn,et al.  Validation of diagnostic codes within medical services claims. , 2004, Journal of clinical epidemiology.

[24]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .