Is Bland-Altman plot method useful without inference for accuracy, precision, and agreement?

Objective: Bland and Altman plot method is a widely cited and applied graphical approach for assessing the equivalence of quantitative measurement techniques, usually aiming to replace a traditional technique with a new, less invasive, or less expensive one. Although easy to communicate, Bland and Altman plot is often misinterpreted by lacking suitable inferential statistical support. Usual alternatives, such as Pearson's correlation or ordinal least-square linear regression, also fail to locate the weakness of each measurement technique. Method: Here, inferential statistics support for equivalence between measurement techniques is proposed in three nested tests based on structural regressions to assess the equivalence of structural means (accuracy), the equivalence of structural variances (precision), and concordance with the structural bisector line (agreement in measurements obtained from the same subject), by analytical methods and robust approach by bootstrapping. Graphical outputs are also implemented to follow Bland and Altman's principles for easy communication. Results: The performance of this method is shown and confronted with five data sets from previously published articles that applied Bland and Altman's method. One case demonstrated strict equivalence, three cases showed partial equivalence, and one showed poor equivalence. The developed R package containing open codes and data are available with installation instructions for free distribution at Harvard Dataverse at https://doi.org/10.7910/DVN/AGJPZH. It is possible to test whether two techniques may have full equivalence, preserving graphical communication according to Bland and Altman's principles, but adding robust and suitable inferential statistics. Decomposing the equivalence in accuracy, precision, and agreement helps the location of the source of the problem in order to fix a new technique.

[1]  M. Bøgsted,et al.  On Jones et al.’s method for extending Bland-Altman plots to limits of agreement with the mean for multiple observers , 2020, BMC Medical Research Methodology.

[2]  Nathaniel T. Stevens,et al.  Using multiple agreement methods for continuous repeated measures data: a tutorial for practitioners , 2020, BMC Medical Research Methodology.

[3]  P. Halfon,et al.  A new statistical methodology overcame the defects of the Bland & Altman method. , 2020, Journal of clinical epidemiology.

[4]  Patrick Taffé,et al.  Assessing bias, precision, and agreement in method comparison studies , 2020, Statistical methods in medical research.

[5]  Jiang-Li Zhao,et al.  Between-days intra-rater reliability with a hand held myotonometer to quantify muscle tone in the acute stroke population , 2017, Scientific Reports.

[6]  S. Kamel‐Reid,et al.  Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing , 2017, Journal of Clinical Pathology.

[7]  C. Kelly,et al.  Validity of self-reported height and weight for estimating prevalence of overweight among Estonian adolescents: the Health Behaviour in School-aged Children study , 2015, BMC Research Notes.

[8]  C. Kelly,et al.  Validity of self-reported height and weight for estimating prevalence of overweight among Estonian adolescents: the Health Behaviour in School-aged Children study , 2015, BMC Research Notes.

[9]  K. Kario,et al.  Differences between clinic blood pressure and morning home blood pressure, as shown by Bland–Altman plots, in a large observational study (HONEST study) , 2015, Hypertension Research.

[10]  D. Giavarina Understanding Bland Altman analysis , 2015, Biochemia medica.

[11]  Andrew Carkeet,et al.  Exact Parametric Confidence Intervals for Bland-Altman Limits of Agreement , 2015, Optometry and vision science : official publication of the American Academy of Optometry.

[12]  E C Hedberg,et al.  The power of a paired t-test with a covariate. , 2015, Social science research.

[13]  Gy Zou Confidence interval estimation for the Bland–Altman limits of agreement with multiple observations per individual , 2013, Statistical methods in medical research.

[14]  S. Roberts Statistical Thinking in Epidemiology. By Y.‐K. Tu and M. Gilthorpe. Boca Raton, Florida: CRC Press. 2011. 231 pages. UK£57.99 (hardback). ISBN 978‐1‐4200‐9991‐1. , 2012 .

[15]  Sophie Donnet,et al.  Statistical Thinking in Epidemiology , 2012 .

[16]  Annette Dobson,et al.  A graphical method for assessing agreement with the mean between multiple observers using continuous measures. , 2011, International journal of epidemiology.

[17]  J. Vieira,et al.  What Rules of Thumb Do Clinicians Use to Decide Whether to Antagonize Nondepolarizing Neuromuscular Blocking Drugs? , 2011, Anesthesia and analgesia.

[18]  Mohamed Shoukri,et al.  Measures of Interobserver Agreement and Reliability , 2010 .

[19]  Philippe Jacquart,et al.  On making causal claims: A review and recommendations , 2010 .

[20]  A. Petrie,et al.  Method agreement analysis: a review of correct methodology. , 2010, Theriogenology.

[21]  V. Chinchilli,et al.  Comparison of the ATS Versus EU Mini Wright Peak Flow Meter in Normal Volunteers , 2010, The Journal of asthma : official journal of the Association for the Care of Asthma.

[22]  D. Altman,et al.  Applying the right statistics: analyses of measurement studies , 2003, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.

[23]  B. McCartin A GEOMETRIC CHARACTERIZATION OF LINEAR REGRESSION , 2003 .

[24]  P. Glaister Least squares revisited , 2001 .

[25]  Paul Glaister,et al.  85.13 Least squares revisited , 2001, The Mathematical Gazette.

[26]  K. Linnet,et al.  Necessary sample size for method comparison studies based on regression analysis. , 1999, Clinical chemistry.

[27]  G Dunn,et al.  Statistical methods in laboratory medicine. , 1999, Statistical methods in medical research.

[28]  D. Altman,et al.  Measuring agreement in method comparison studies , 1999, Statistical methods in medical research.

[29]  K. Linnet,et al.  Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies. , 1998, Clinical chemistry.

[30]  Patrick D. Gerard,et al.  Limits of retrospective power analysis , 1998 .

[31]  D G Altman,et al.  Comparison of methods of measuring blood pressure. , 1986, Journal of epidemiology and community health.

[32]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[33]  G. K. Shukla Some Exact Tests of Hypotheses about Grubbs's Estimators , 1973 .

[34]  Paul D. Isaac,et al.  Linear regression, structural relations, and measurement error. , 1970 .

[35]  P. D. Oldham,et al.  A note on the analysis of repeated measurements of the same subjects. , 1962, Journal of chronic diseases.

[36]  R. Savage Probability inequalities of the Tchebycheff type , 1961 .

[37]  Chas. H. Kummell,et al.  Reduction of Observation Equations Which Contain More Than One Observed Quantity , 1879 .

[38]  Christopher Rao,et al.  Graphs in Statistical Analysis , 2010 .

[39]  D. Hinkley Bootstrap Methods: Another Look at the Jackknife , 2008 .

[40]  Petter Laake,et al.  On the simple linear regression model with correlated measurement errors , 2007 .

[41]  G Atkinson,et al.  Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine , 1998, Sports medicine.

[42]  M. A. Creasy CONFIDENCE LIMITS FOR THE GRADIENT IN THE LINEAR FUNCTIONAL RELATIONSHIP , 1956 .