A Hybrid Outlier Detection Method for Health Care Big Data

Technology advancements in health care informatics, digitalizing health records, and telemedicine has resulted in rapid growth of health care data. One challenge is how to effectively discover useful and important information out of such massive amount of data through techniques such as data mining. Outlier detection is a typical technique used in many fields to analyze big data. However, for the large scale and high-dimensional heath care data, the conventional outlier detection methods are not efficient. This paper proposes a novel hybrid outlier detection method, namely, Pruning-based K-Nearest Neighbor (PB-KNN), which integrates the density-based, cluster-based methods and KNN algorithm to conduct effective outlier detection. The proposed PB-KNN adopts the case classification quality character (CCQC) as the medical quality evaluation model and uses the attribute overlapping rate (AOR) algorithm for data classification and dimensionality reduction. To evaluate the performance of the pruning operations in PB-KNN, we conduct extensive experiments. The experiment results show that the PB-KNN method outperforms the k-nearest neighbor (KNN) and local outlier factor (LOF) in terms of the accuracy and efficiency.

[1]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[2]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[3]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[4]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[5]  A. Donabedian Evaluating the quality of medical care. 1966. , 1966, The Milbank quarterly.

[6]  Sung-Ryul Kim,et al.  Clustering Algorithm Based on Outlier Detection for Anomaly Intrusion Detection , 2016 .

[7]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[8]  S. Vijayarani Ms. P. Jothi Detecting Outliers in Data streams usingClustering Algorithms , 2013 .

[9]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[10]  Tang Jin-hai An application of case classification quality character model in evaluating medical quality of breast cancer , 2008 .

[11]  S. Kannan and K. Somasundaram,et al.  A Review of Outlier Prediction Techniques in Data Mining , 2015 .

[12]  H. Koh,et al.  Data mining applications in healthcare. , 2005, Journal of healthcare information management : JHIM.

[13]  Jos van Hillegersberg,et al.  Outlier detection in healthcare fraud: A case study in the Medicaid dental domain , 2016, Int. J. Account. Inf. Syst..

[14]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[15]  Daniel T. Larose,et al.  k‐Nearest Neighbor Algorithm , 2005 .

[16]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[17]  T NgRaymond,et al.  Distance-based outliers: algorithms and applications , 2000, VLDB 2000.

[18]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  J. F. Milne,et al.  Modern hospital management , 1969 .

[20]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[21]  José Maria Paganni,et al.  Evaluating the quality of medical care. , 1966, The Milbank Memorial Fund quarterly.