Patient-specific analysis of sequential haematological data by multiple linear regression and mixture distribution modelling.

Automated storage and analysis of the results of serial haematologic studies are now technically feasible with present-day laboratory instruments and devices for data storage and processing. In current practice, physicians mentally compare a laboratory result with previous values and use their clinical judgement to determine the significance of any change. To provide a statistical basis for this process, we describe a new approach for the detection of changes in patient-specific sequential measurements of standard haematologic laboratory tests. These methods include hierarchical multiple regression modelling, with a weighted minimum risk criteria for model selection, to choose models indicating changes in mean values over time. This study is the first to analyse sequential patient-specific distributions of laboratory measurements, utilizing mixture distribution modelling with systematic selection of starting values for the EM algorithm. To evaluate these statistical methods under controlled conditions, we studied 11 healthy human volunteers who were depleted of iron by serial phlebotomy to iron-deficiency anaemia, then treated with oral iron supplements to replete iron stores and correct the anaemia. Application of sequential patient-specific analyses of haemoglobin, haematocrit, and mean cell volume showed that significant departures from past values could be identified, in many cases, even when values were still within the population reference ranges. Additionally, for all subjects sequential alterations in red blood cell volume distributions during development of iron-deficiency anaemia could be characterized and quantified. These methods promise to provide more sensitive techniques for improved diagnostic evaluation of developing anaemia and serial monitoring of response to therapy.

[1]  C. McLaren,et al.  Detection of two-component mixtures of lognormal distributions in grouped, doubly truncated data: analysis of red blood cell volume distributions. , 1991, Biometrics.

[2]  D. L. Hawkins A simple least squares method for estimating a change in mean , 1986 .

[3]  L. Johnson,et al.  Short-Term and Long-TermVariabilityof IndicesRelated to NutritionalStatus . I : Ca , Cu , Fe , Mg , and Zn , 2022 .

[4]  G. Z. Williams,et al.  Individual character of variation in time-series studies of healthy people: II. Differences in values for clinical chemical analytes in serum among demographic groups, by age and sex. , 1978, Clinical chemistry.

[5]  D. Siegmund,et al.  Tests for a change-point , 1987 .

[6]  T. Gasser,et al.  Residual variance and residual pattern in nonlinear regression , 1986 .

[7]  S. Zacks SURVEY OF CLASSICAL AND BAYESIAN APPROACHES TO THE CHANGE-POINT PROBLEM: FIXED SAMPLE AND SEQUENTIAL PROCEDURES OF TESTING AND ESTIMATION11Research supported in part by ONR Contracts N00014-75-0725 at The George Washington University and N00014-81-K-0407 at SUNY-Binghamton. , 1983 .

[8]  Edna Schechtman,et al.  Conditional bootstrap methods in the mean-shift model , 1987 .

[9]  D. L. Hawkins A u-i approach to retrospective testing for shifting parameters in a linear model , 1989 .

[10]  R. Verwilghen,et al.  ICSH recommendations for the analysis of red cell, white cell and platelet size distribution curves. Methods for fitting a single reference distribution and assessing its goodness of fit , 2008 .

[11]  V. Hasselblad,et al.  Statistical and graphical evaluation of erythrocyte volume distributions. , 1987, The American journal of physiology.

[12]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[13]  V. Hasselblad,et al.  Analysis of the volume of red blood cells: application of the expectation-maximization algorithm to grouped data from the doubly-truncated lognormal distribution. , 1986, Biometrics.

[14]  ICSH recommendations for the analysis of red cell, white cell and platelet size distribution curves: I General principles. , 1982, Journal of clinical pathology.

[15]  M Frisén,et al.  Evaluations of methods for statistical surveillance. , 1992, Statistics in medicine.

[16]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[17]  M. C. Jones,et al.  Spline Smoothing and Nonparametric Regression. , 1989 .

[18]  International Committee for Standardization in Haematology , 1967 .

[19]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[20]  G. McLachlan,et al.  An algorithm for the likelihood ratio test of one versus two components in a normal mixture model fitted to grouped and truncated data , 1995 .

[21]  George M. Furnival,et al.  Regressions by leaps and bounds , 2000 .

[22]  Y. Yin,et al.  Detection of the number, locations and magnitudes of jumps , 1988 .

[23]  D. Hawkins Testing a Sequence of Observations for a Shift in Location , 1977 .

[24]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[25]  S G Thompson,et al.  A method of analysis of laboratory data in an epidemiological study where time trends are present. , 1983, Statistics in medicine.

[26]  G. McLachlan,et al.  Algorithm AS 254: maximum likelihood estimation from grouped and truncated data with finite normal mixture models , 1990 .

[27]  P. Winkel,et al.  Using the Subject as His Own Referent in Assessing Day-to-Day Changes of Laboratory Test Results , 1977 .

[28]  B. Yandell Spline smoothing and nonparametric regression , 1989 .

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  S. Bentley,et al.  Stability of hematologic parameters in healthy subjects. Intraindividual versus interindividual variation. , 1988, American journal of clinical pathology.

[31]  E. K. Harris,et al.  Biological and analytic components of variation in long-term studies of serum constituents in normal subjects. 3. Physiological and medical implications. , 1970, Clinical chemistry.