Medicare fraud detection using neural networks

Access to affordable healthcare is a nationwide concern that impacts a large majority of the United States population. Medicare is a Federal Government healthcare program that provides affordable health insurance to the elderly population and individuals with select disabilities. Unfortunately, there is a significant amount of fraud, waste, and abuse within the Medicare system that costs taxpayers billions of dollars and puts beneficiaries’ health and welfare at risk. Previous work has shown that publicly available Medicare claims data can be leveraged to construct machine learning models capable of automating fraud detection, but challenges associated with class-imbalanced big data hinder performance. With a minority class size of 0.03% and an opportunity to improve existing results, we use the Medicare fraud detection task to compare six deep learning methods designed to address the class imbalance problem. Data-level techniques used in this study include random over-sampling (ROS), random under-sampling (RUS), and a hybrid ROS–RUS. The algorithm-level techniques evaluated include a cost-sensitive loss function, the Focal Loss, and the Mean False Error Loss. A range of class ratios are tested by varying sample rates and desirable class-wise performance is achieved by identifying optimal decision thresholds for each model. Neural networks are evaluated on a 20% holdout test set, and results are reported using the area under the receiver operating characteristic curve (AUC). Results show that ROS and ROS–RUS perform significantly better than baseline and algorithm-level methods with average AUC scores of 0.8505 and 0.8509, while ROS–RUS maximizes efficiency with a 4× speedup in training time. Plain RUS outperforms baseline methods with up to 30× improvements in training time, and all algorithm-level methods are found to produce more stable decision boundaries than baseline methods. Thresholding results suggest that the decision threshold always be optimized using a validation set, as we observe a strong linear relationship between the minority class size and the optimal threshold. To the best of our knowledge, this is the first study to compare multiple data-level and algorithm-level deep learning methods across a range of class distributions. Additional contributions include a unique analysis of the relationship between minority class size and optimal decision threshold and state-of-the-art performance on the given Medicare fraud detection task.

[1]  Taghi M. Khoshgoftaar,et al.  A Novel Method for Fraudulent Medicare Claims Detection from Expected Payment Deviations (Application Paper) , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[2]  S. E. Ahmed Perspectives on big data analysis : methodologies and applications : International Workshop on Perspectives on High-Dimensional Data Anlaysis II, May 30-June 1, 2012, Centre de recherches mathématiques, Université de Montréal, Montréal , 2014 .

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Jionghua Jin,et al.  A survey on statistical methods for health care fraud detection , 2008, Health care management science.

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  S. Mohamed,et al.  Statistical Normalization and Back Propagation for Classification , 2022 .

[7]  David Masko,et al.  The Impact of Imbalanced Training Data for Convolutional Neural Networks , 2015 .

[8]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[9]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[10]  Kathy A. Mills What is big data , 2019 .

[11]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[12]  Misop Han,et al.  Variability in Medicare utilization and payment among urologists. , 2015, Urology.

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[15]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Nitesh V. Chawla,et al.  Does Medical School Training Relate to Practice? Evidence from Big Data , 2015, Big Data.

[17]  A. Santhakumaran,et al.  Statistical Normalization and Back Propagationfor Classification , 2011 .

[18]  Taghi M. Khoshgoftaar,et al.  A survey on addressing high-class imbalance in big data , 2018, Journal of Big Data.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[22]  Shaogang Gong,et al.  Imbalanced Deep Learning by Minority Class Incremental Rectification , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Taghi M. Khoshgoftaar,et al.  Medicare Fraud Detection Using Machine Learning Methods , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[25]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[26]  Taghi M. Khoshgoftaar,et al.  Predicting Medical Provider Specialties to Detect Anomalous Insurance Claims , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[27]  Taghi M. Khoshgoftaar,et al.  Big Data fraud detection using multiple medicare data sources , 2018, J. Big Data.

[28]  Taghi M. Khoshgoftaar,et al.  A Probabilistic Programming Approach for Outlier Detection in Healthcare Claims , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[29]  Longbing Cao,et al.  Training deep neural networks on imbalanced data sets , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[30]  S. Ahmed,et al.  Perspectives on Big Data Analysis , 2014 .

[31]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[32]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Taghi M. Khoshgoftaar,et al.  The Detection of Medicare Fraud Using Machine Learning Methods with Excluded Provider Labels , 2018, FLAIRS.

[34]  Hansang Lee,et al.  Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[35]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36]  Karl Branting,et al.  Graph analytics for healthcare fraud risk estimation , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[37]  Varun Chandola,et al.  Knowledge discovery from massive healthcare claims data , 2013, KDD.

[38]  Taghi M. Khoshgoftaar,et al.  Medical Provider Specialty Predictions for the Detection of Anomalous Medicare Insurance Claims , 2017, 2017 IEEE International Conference on Information Reuse and Integration (IRI).

[39]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[40]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[41]  Shu-Ching Chen,et al.  Dynamic Sampling in Convolutional Neural Networks for Imbalanced Data Classification , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[42]  Richard P. Lippmann,et al.  Neural Networks, Bayesian a posteriori Probabilities, and Pattern Classification , 1994 .

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[46]  Taghi M. Khoshgoftaar,et al.  A Study on the Relationships of Classifier Performance Metrics , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[47]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[48]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Cheng Guo,et al.  Entity Embeddings of Categorical Variables , 2016, ArXiv.

[50]  A. Gelman Analysis of variance: Why it is more important than ever? , 2005, math/0504499.

[51]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[52]  Lewis Morris,et al.  Combating fraud in health care: an essential component of any cost containment strategy. , 2009, Health affairs.

[53]  Jitendra Malik,et al.  Beyond Skip Connections: Top-Down Modulation for Object Detection , 2016, ArXiv.

[54]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[55]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[56]  Yixin Chen,et al.  Predicting Hospital Readmission via Cost-Sensitive Deep Learning , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[57]  Keisuke Nemoto,et al.  Classification of Rare Building Change Using CNN with Multi-Class Focal Loss , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[58]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[59]  Shin Ando,et al.  Deep Over-sampling Framework for Classifying Imbalanced Data , 2017, ECML/PKDD.

[60]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.