Detection of two-component mixtures of lognormal distributions in grouped, doubly truncated data: analysis of red blood cell volume distributions.

We have examined the statistical requirements for the detection of mixtures of two lognormal distributions in doubly truncated data when the sample size is large. The expectation-maximization algorithm was used for parameter estimation. A bootstrap approach was used to test for a mixture of distributions using the likelihood ratio statistic. Analysis of computer simulated mixtures showed that as the ratio of the difference between the means to the minimum standard deviation increases, the power for detection also increases and the accuracy of parameter estimates improves. These procedures were used to examine the distribution of red blood cell volume in blood samples. Each distribution was doubly truncated to eliminate artifactual frequency counts and tested for best fit to a single lognormal distribution or a mixture of two lognormal distributions. A single population was found in samples obtained from 60 healthy individuals. Two subpopulations of cells were detected in 25 of 27 mixtures of blood prepared in vitro. Analyses of mixtures of blood from 40 patients treated for iron-deficiency anemia showed that subpopulations could be detected in all by 6 weeks after onset of treatment. To determine if two-component mixtures could be detected, distributions were examined from untransfused patients with refractory anemia. In two patients with inherited sideroblastic anemia a mixture of microcytic and normocytic cells was found, while in the third patient a single population of microcytic cells was identified. In two family members previously identified as carriers of inherited sideroblastic anemia, mixtures of microcytic and normocytic subpopulations were found. Twenty-five patients with acquired myelodysplastic anemia were examined. A good fit to a mixture of subpopulations containing abnormal microcytic or macrocytic cells was found in two. We have demonstrated that with large sample sizes, mixtures of distributions can be detected even when distributions appear to be unimodal. These statistical techniques provide a means to characterize and quantify alterations in erythrocyte subpopulations in anemia but could also be applied to any set of grouped, doubly truncated data to test for the presence of a mixture of two lognormal distributions.

[1]  T. A. Bray,et al.  A Convenient Method for Generating Normal Variables , 1964 .

[2]  Bhattacharya Cg A simple method of resolution of a distribution into gaussian components. , 1967 .

[3]  J. Behboodian On the Modes of a Mixture of Two Normal Distributions , 1970 .

[4]  Hines Jd,et al.  The sideroblastic anemias. , 1970 .

[5]  J. Behboodian Information matrix for a mixture of two normal distributions , 1972 .

[6]  David W. Hosmer,et al.  On mle of the parameters of a mixture of two normal distributions when the sample size is small , 1973 .

[7]  J. England,et al.  Red-cell-volume distribution curves and the measurement of anisocytosis. , 1974, Lancet.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  M. Aitkin,et al.  Mixture Models, Outliers, and the EM Algorithm , 1980 .

[10]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[11]  Peter J. Bickel,et al.  S: An Interactive Environment for Data Analysis and Graphics , 1984 .

[12]  C. McLaren,et al.  Carbonyl iron therapy for iron deficiency anemia , 1986 .

[13]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[14]  Galton Da,et al.  Myelodysplastic syndromes: natural history and features of prognostic importance. , 1986, Clinics in haematology.

[15]  G. Tricot,et al.  Multiple chromosomally distinct cell populations in myelodysplastic syndromes and their possible significance in the evolution of the disease , 1986, British journal of haematology.

[16]  Statistical Modelling of the Distribution of Red Blood Cell Volumes in Iron Deficiency Anemia Using the Expectation‐Maximisation Algorithm , 1986 .

[17]  V. Hasselblad,et al.  Analysis of the volume of red blood cells: application of the expectation-maximization algorithm to grouped data from the doubly-truncated lognormal distribution. , 1986, Biometrics.

[18]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[19]  V. Hasselblad,et al.  Statistical and graphical evaluation of erythrocyte volume distributions. , 1987, The American journal of physiology.

[20]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[21]  G. McLachlan,et al.  Fitting mixture models to grouped and truncated data via the EM algorithm. , 1988, Biometrics.

[22]  H. Heimpel,et al.  Clonal analysis of myelodysplastic syndromes: evidence of multipotent stem cell origin. , 1989, Blood.