Hierarchical Models for Screening of Iron Deficiency Anemia

We investigate the problem of classifying individuals based on estimated density functions for each individual. Given labelled histograms characterizing red blood cells (RBCs) for di erent individuals, the learning problem is to build a classi er which can classify new unlabelled histograms into normal and iron de cient classes. Thus, the problem is similar to conventional classi cation in that there is labelled training data, but di erent in that the underlying measurements are not feature vectors but histograms or density estimates. We describe a general framework based on probabilistic hierarchical models for modelling such data and illustrate how the model lends itself to classi cation. We contrast this approach with two other alternatives: (1) directly de ning distance between densities using a cross-entropy distance measure, and (2) using parameters of the estimated densities as feature vectors for a standard discriminative classi cation framework. We evaluate all three methods on a real-world data set consisting of 180 subjects. The hierarchical modeling and density-distance approaches are most accurate, yielding cross-validated error rates in the range of 1 to 2%. We conclude by discussing the relative merits of each approach, including the interpretability of each model from a clinical diagnostic viewpoint.