Outlier Mining in Medical Databases: An Application of Data Mining in Health Care Management to Detect Abnormal Values Presented In Medical Databases

Outliers in medical databases can be caused by measurement errors or may be the result of inherent data variability. The abnormal value of mitoses, for instance, could lead to the diagnosis of malignant cancer or it might just be due to human mistake or execution error. In this paper, we make use of a large database, namely, Wisconsin Breast Cancer Database containing 10 attributes and 699 instances to detect outliers. Many data mining algorithms try to minimize the influence of outliers which could result in the loss of important hidden information since “one person's noise could be another person's signal". In particular, we used TANAGRA (A Data Mining Tool) to detect outliers from Breast Cancer Database and analyzed them for knowledge discovery. The results of the experiment show that outlier mining i.e. outlier detection & analysis have a great potential to find useful information from health care databases which consequently helps decision makers to automate & quicken the process of decision making in clinical diagnosis as well as other domains of health care management.