论文信息 - Interactive learning for efficiently detecting errors in insurance claims

Interactive learning for efficiently detecting errors in insurance claims

Many practical data mining systems such as those for fraud detection and surveillance deal with building classifiers that are not autonomous but part of a larger interactive system with an expert in the loop. The goal of these systems is not just to maximize the performance of the classifier but to make the experts more efficient at performing their task, thus maximizing the overall Return on Investment of the system. This paper describes an interactive system for detecting payment errors in insurance claims with claim auditors in the loop. We describe an interactive claims prioritization component that uses an online cost-sensitive learning approach (more-like-this) to make the system efficient. Our interactive prioritization component is built on top of a batch classifier that has been trained to detect payment errors in health insurance claims and optimizes the interaction between the classifier and the domain experts who are consuming the results of this system. The goal is to make these auditors more efficient and effective as well as improving the classification performance of the system. The result is both a reduction in time it takes for the auditors to review and label claims as well as improving the precision of the system in finding payment errors. We show results obtained from applying this system at two major US health insurance companies indicating significant reduction in claim audit costs and potential savings of $20-$26 million/year making the insurance providers more efficient and lowering their operating costs. Our system reduces the money being wasted by providers and insurers dealing with incorrectly processed claims and makes the healthcare system more efficient.

Rayid Ghani | Mohit Kumar | R. Ghani | Mohit Kumar

[1] George Karypis,et al. CLUTO - A Clustering Toolkit , 2002 .

[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[3] Rayid Ghani,et al. Data mining to predict and prevent errors in health insurance claims processing , 2010, KDD.

[4] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[5] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[6] Sudipto Guha,et al. Approximation algorithms for budgeted learning problems , 2007, STOC '07.

[7] Marko Grobelnik,et al. Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[8] Foster J. Provost,et al. Active Feature-Value Acquisition , 2009, Manag. Sci..

[9] Gregory Piatetsky-Shapiro,et al. Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[10] Dale Schuurmans,et al. Discriminative Batch Mode Active Learning , 2007, NIPS.

[11] Robert L. Grossman,et al. Detecting changes in large data sets of payment card data: a case study , 2007, KDD '07.

[12] Russell Greiner,et al. Learning and Classifying Under Hard Budgets , 2005, ECML.

[13] Mark Craven,et al. Active Learning with Real Annotation Costs , 2008 .

[14] D. Sculley,et al. Combined regression and ranking , 2010, KDD.

[15] Dragos D. Margineantu,et al. Active Cost-Sensitive Learning , 2005, IJCAI.