Unsupervised Labeling of Data for Supervised Learning and its Application to Medical claims Prediction

The task identifying changes and irregularities in medical insurance claim pay-ments is a difficult process of which the traditional practice involves queryinghistorical claims databases and flagging potential claims as normal or abnor-mal. Because what is considered as normal payment is usually unknown andmay change over time, abnormal payments often pass undetected; only to bediscovered when the payment period has passed.This paper presents the problem of on-line unsupervised learning from datastreams when the distribution that generates the data changes or drifts overtime. Automated algorithms for detecting drifting concepts in a probabilitydistribution of the data are presented. The idea behind the presented driftdetection methods is to transform the distribution of the data within a slidingwindow into a more convenient distribution. Then, a test statistics p-value ata given significance level can be used to infer the drift rate, adjust the windowsize and decide on the status of the drift. The detected concepts drifts areused to label the data, for subsequent learning of classification models by asupervised learner. The algorithms were tested on several synthetic and realmedical claims data sets.

[1]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[2]  Marco Reale,et al.  Detecting multiple mean breaks at unknown points in official time series , 2008, Math. Comput. Simul..

[3]  Nathan J. Mantua,et al.  Methods for detecting regime shifts in large marine ecosystems: a review with approaches applied to North Pacific data , 2004 .

[4]  Nathan Intrator,et al.  Using unsupervised incremental learning to cope with gradual concept drift , 2011, Connect. Sci..

[5]  Thomas C. Peterson,et al.  A new method for detecting undocumented discontinuities in climatological time series , 1995 .

[6]  Y. Son,et al.  Bayesian single change point detection in a sequence of multivariate normal observations , 2005 .

[7]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[8]  David V. Hinkley,et al.  Inference about the change-point in a sequence of binomial variables , 1970 .

[9]  S. Panchapakesan,et al.  Inference about the Change-Point in a Sequence of Random Variables: A Selection Approach , 1988 .

[10]  M LazarescuMihai,et al.  Using multiple windows to track concept drift , 2004 .

[11]  Kenneth A. Kaufman,et al.  The AQ21 Natural Induction Program for Pattern Discovery: Initial Version and its Novel Features , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[12]  J. Alheit,et al.  A BRIEF OVERVIEW OF THE REGIME SHIFT DETECTION METHODS , 2006 .

[13]  Heriberto Cabezas,et al.  Detection and Assessment of Ecosystem Regime Shifts from Fisher Information , 2008 .

[14]  C. Loader CHANGE POINT ESTIMATION USING NONPARAMETRIC REGRESSION , 1996 .

[15]  Janusz Wojtusiak,et al.  Rule-Based Prediction of Medical Claims' Payments: A Method and Initial Application to Medicaid Data , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[16]  Cyrus Shahabi,et al.  Change Detection in Time Series Data Using Wavelet Footprints , 2005, SSTD.

[17]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[18]  Gerhard Widmer,et al.  Effective Learning in Dynamic Environments by Explicit Context Tracking , 1993, ECML.