论文信息 - Predicting disease by using data mining based on healthcare information system - 字舞流文

Predicting disease by using data mining based on healthcare information system

This paper applies the data mining process to predict hypertension from patient medical records with eight other diseases. A sample with the size of 9862 cases has been studied. The sample was extracted from a real world Healthcare Information System database containing 309383 medical records. We observed that the distribution of patient diseases in the medical database is imbalanced. Under-sampling technique has been applied to generate training data sets, and data mining tool Weka has been used to generate the NaIve Bayesian and J-48 classifiers. In addition, an ensemble of five J-48 classifiers was created trying to improve the prediction performance, and rough set tools were used to reduce the ensemble based on the idea of second-order approximation. Experimental results showed a little improvement of the ensemble approach over pure Na'ive Bayesian and J-48 in accuracy, sensitivity, and F-measure.

Chien-Chung Chan | Shengyong Wang | Feixiang Huang | Chien-Chung Chan | Shengyong Wang | Feixiang Huang

[1] Norberto F. Ezquerra,et al. Mining constrained association rules to predict heart disease , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[2] M Anbarasi,et al. ENHANCED PREDICTION OF HEART DISEASE WITH FEATURE SUBSET SELECTION USING GENETIC ALGORITHM , 2010 .

[3] D. Lubeck,et al. Predicting disease recurrence in intermediate and high-risk patients undergoing radical prostatectomy using percent positive biopsies: results from CaPSURE. , 2002, Urology.

[4] Merrick I Ross,et al. Positive surgical margins and ipsilateral breast tumor recurrence predict disease‐specific survival after breast‐conserving therapy , 2003, Cancer.

[5] D O Cosgrove,et al. Hepatic vein transit times using a microbubble agent can predict disease severity non-invasively in patients with hepatitis C , 2004, Gut.

[6] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[7] A. Sharrett,et al. Coronary Heart Disease Prediction From Lipoprotein Cholesterol Levels, Triglycerides, Lipoprotein(a), Apolipoproteins A-I and B, and HDL Density Subfractions: The Atherosclerosis Risk in Communities (ARIC) Study , 2001, Circulation.

[8] L. Mofenson,et al. Maternal and Infant Factors Predicting Disease Progression in Human Immunodeficiency Virus Type 1-Infected Infants , 2000, Pediatrics.

[9] J. Bell. Predicting disease using genomics , 2004, Nature.

[10] Manuel Hidalgo,et al. Expression of epiregulin and amphiregulin and K-ras mutation status predict disease control in metastatic colorectal cancer patients treated with cetuximab. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[11] P. Greenland,et al. Coronary artery calcium score and risk classification for coronary heart disease prediction. , 2010, JAMA.

[12] Frances S. Turner,et al. POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[13] Nitesh V. Chawla,et al. Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[14] C. Stegeman,et al. Anti-neutrophil cytoplasmic antibody (ANCA) levels directed against proteinase-3 and myeloperoxidase are helpful in predicting disease relapse in ANCA-associated small-vessel vasculitis. , 2002, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[15] Cornelis J H van de Velde,et al. Validation of a nomogram for predicting disease‐specific survival after an R0 resection for gastric carcinoma , 2005, Cancer.

[16] Robert C. Holte,et al. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[17] H. Moser,et al. X-linked adrenoleukodystrophy: the role of contrast-enhanced MR imaging in predicting disease progression. , 2000, AJNR. American journal of neuroradiology.

[18] Chris. Drummond,et al. C 4 . 5 , Class Imbalance , and Cost Sensitivity : Why Under-Sampling beats OverSampling , 2003 .

[19] Ms. Ishtake. " Intelligent Heart Disease Prediction System Using Data Mining Techniques " , .

[20] Szymon Wilk,et al. Rough Set Based Data Exploration Using ROSE System , 1999, ISMIS.