We apply machine learning to the problem of subpopulation assessment for Caesarian Section. In subpopulation assessment, we are interested in making predictions not for a single patient, but for groups of patients. Typically, in any large population, different subpopulations will have different "outcome" rates. In our example, the C-section rate of a population of 22,176 expectant mothers is 16.8%; yet, the 17 physician groups that serve this population have vastly different group C-section rates, ranging from 11% to 23%. The ultimate goal of subpopulation assessment is to determine if these variations in the observed rates can be attributed to (a) variations in intrinsic risk of the patient sub-populations (i.e. some groups contain more "high-risk C-section" patients), or (b) differences in physician practice (i.e. some groups do more C-sections). Our results indicate that although there is some variation in intrinsic risk, there is also much variation in physician practice.
[1]
J. Bailit,et al.
Risk adjustment for interhospital comparison of primary cesarean rates.
,
1999,
Obstetrics and gynecology.
[2]
Andreas Zell,et al.
SNNS (Stuttgart Neural Network Simulator)
,
1994
.
[3]
Rich Caruana,et al.
A Non-Parametric EM-Style Algorithm for Imputing Missing Values
,
2001,
AISTATS.
[4]
P. A. Poma,et al.
Effects of obstetrician characteristics on cesarean delivery rates. A community hospital experience.
,
1999,
American journal of obstetrics and gynecology.
[5]
R. Caruana,et al.
Predicting cesarean delivery with decision tree models.
,
2000,
American journal of obstetrics and gynecology.
[6]
Ron Kohavi,et al.
The Case against Accuracy Estimation for Comparing Induction Algorithms
,
1998,
ICML.