Sequence Mining and Prediction-Based Healthcare Fraud Detection Methodology

This article presents a novel methodology to detect insurance claim related frauds in the healthcare system using concepts of sequence mining and sequence prediction. Fraud detection in healthcare is a non-trivial task due to the heterogeneous nature of healthcare records. Fraudsters behave as normal patients and with the passage of time keep on changing their way of planting frauds; hence, there is a need to develop fraud detection models. The sequence generation is not the part of previous researches which mostly focus on amount based analysis or medication versus diseases sequential analysis. The proposed methodology is able to generate sequences of services availed or prescribed by each specialty and analyse via two cascaded checks for the detection of insurance claim related frauds. The methodology addresses these challenges and self learns from historical medical records. It is based on two modules namely “Sequence rule engine and Prediction based engine”. The sequence rule engine generates frequent sequences and probabilities of rare sequences for each specialty of the hospital. The comparison of such sequences with the actual patient sequences leads to the identification of anomalies as both sequences are not compliant to the sequences of the rule engine. The system performs further in detail analysis on all non-compliant sequences in the prediction based engine. The proposed methodology is validated by generating patient sequences from last five years transactional data of a local hospital and identifies patterns of service procedures administered to patients using Prefixspan algorithm and Compact prediction tree. Various experiments have been performed to validate the applicability of the developed methodology and the results demonstrate that the methodology is pertinent to detect healthcare frauds and provides on average 85% of accuracy. Thus can help in preventing fraudulent claims and provides better insight into how to improve patient management and treatment procedures.

[1]  Han-Cheng Wang,et al.  Developing a data mining approach to investigate association between physician prescription and patient outcome - A study on re-hospitalization in Stevens-Johnson Syndrome , 2013, Comput. Methods Programs Biomed..

[2]  Mika Salmi,et al.  A Novel Classification and Online Platform for Planning and Documentation of Medical Applications of Additive Manufacturing , 2014, Surgical innovation.

[3]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[4]  K. Chia,et al.  Angiotensin-I converting enzyme insertion/deletion polymorphism and its association with diabetic nephropathy: a meta-analysis of studies reported between 1994 and 2004 and comprising 14,727 subjects , 2005, Diabetologia.

[5]  Ofer Mendelevitch,et al.  Identifying frauds and anomalies in Medicare-B dataset , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[6]  S. Kōzuma,et al.  The significance of serum anti-Müllerian hormone (AMH) levels in patients over age 40 in first IVF treatment , 2013, Journal of Assisted Reproduction and Genetics.

[7]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[8]  Anuja Arora,et al.  Business competitive analysis using promoted post detection on social media , 2020 .

[9]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Kumar Sricharan,et al.  Graph Analysis for Detecting Fraud, Waste, and Abuse in Healthcare Data , 2015, AI Mag..

[11]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[12]  Noura Al Nuaimi Data mining approaches for predicting demand for healthcare services in Abu Dhabi , 2014, 2014 10th International Conference on Innovations in Information Technology (IIT).

[13]  Majid Ahmadi,et al.  Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[14]  K. Kotani,et al.  Comparison of cystatin C- and creatinine-based estimated glomerular filtration rate to predict coronary heart disease risk in Japanese patients with obesity and diabetes. , 2015, Endocrine journal.

[15]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Dominik Aronsky,et al.  The emergency department occupancy rate: a simple measure of emergency department crowding? , 2008, Annals of emergency medicine.

[17]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[18]  Michael Granitzer,et al.  Sequence classification for credit-card fraud detection , 2018, Expert Syst. Appl..

[19]  Rasim Muzaffer Musal Two models to investigate Medicare fraud within unsupervised databases , 2010, Expert Syst. Appl..

[20]  Cláudia Antunes,et al.  Sequential Pattern Mining Algorithms: Trade-offs between Speed and Memory , 2004 .

[21]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[22]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[23]  S. McMillan,et al.  Predicting patient visits to an urgent care clinic using calendar variables. , 2001, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[24]  Chandrawati Putri Wulandari,et al.  Applying sequential pattern mining to investigate cerebrovascular health outpatients’ re-visit patterns , 2018, PeerJ.

[25]  Anuja Arora,et al.  Fraud detection and frequent pattern matching in insurance claims using data mining techniques , 2017, 2017 Tenth International Conference on Contemporary Computing (IC3).

[26]  San-Yih Hwang,et al.  A process-mining framework for the detection of healthcare fraud and abuse , 2006, Expert Syst. Appl..

[27]  Hui Xiong,et al.  Temporal Skeletonization on Sequential Data: Patterns, Categorization, and Visualization , 2016, IEEE Trans. Knowl. Data Eng..

[28]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[29]  Hui Xiong,et al.  Exploiting Temporal and Social Factors for B2B Marketing Campaign Recommendations , 2015, 2015 IEEE International Conference on Data Mining.

[30]  Hui Xiong,et al.  BP-growth: Searching Strategies for Efficient Behavior Pattern Mining , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[31]  Milos Hauskrecht,et al.  Mining recent temporal patterns for event detection in multivariate time series data , 2012, KDD.

[32]  S. Jain Heart disease Prediction System Using data Mining Techniques , 2013 .

[33]  Amor Lazzez,et al.  Sequential Mining: Patterns and Algorithms Analysis , 2013, ArXiv.

[34]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[35]  Roberto Saia,et al.  Unbalanced Data Classification in Fraud Detection by Introducing a Multidimensional Space Analysis , 2018, IoTBDS.

[36]  Yuan Zuo,et al.  Fraud detection via behavioral sequence embedding , 2020, Knowledge and Information Systems.

[37]  Laura L. Pullum,et al.  Sequential pattern mining of electronic healthcare reimbursement claims: Experiences and challenges in uncovering how patients are treated by physicians , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[38]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[39]  Hui Xiong,et al.  Data-driven Automatic Treatment Regimen Development and Recommendation , 2016, KDD.

[40]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[41]  Zahid Anwar,et al.  Data mining techniques and applications — A decade review , 2017, 2017 23rd International Conference on Automation and Computing (ICAC).

[42]  Anuja Arora,et al.  Social graph based location recommendation using users' behavior: By locating the best route and dining in best restaurant , 2016, 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC).

[44]  Diego Reforgiato Recupero,et al.  Fraud detection for E-commerce transactions by employing a prudential Multiple Consensus model , 2019, J. Inf. Secur. Appl..