Method agreement analysis: a review of correct methodology.

The correct approach to analyzing method agreement is discussed. Whether we are considering agreement between two measurements on the same samples (repeatability) or two individuals using identical methodology on identical samples (reproducibility) or comparing two methods, appropriate procedures are described, and worked examples are shown. The correct approaches for both categorical and numerical variables are explained. More complex analyses involving a comparison of more than two pairs of data are mentioned and guidance for these analyses given. Simple formulae for calculating the approximate sample size needed for agreement analysis are also given. Examples of good practice from the reproduction literature are cited, and common errors of methodology are indicated.

[1]  A. Wölfling,et al.  Progesterone determination in equine plasma using different immunoassays. , 1998, Acta veterinaria Hungarica.

[2]  E. Ropstad,et al.  Testicular germ cell development in relation to 5alpha-androstenone levels in pubertal entire male pigs. , 2008, Theriogenology.

[3]  Allan Donner,et al.  Sample size requirements for the design of reliability study: review and new results , 2004 .

[4]  J. Kastelic,et al.  Comparison of 2 enzyme immunoassays and a radioimmunoassay for measurement of progesterone concentrations in bovine plasma, skim milk, and whole milk. , 2008, Canadian journal of veterinary research = Revue canadienne de recherche veterinaire.

[5]  C. Waldner,et al.  A comparison of diagnostic techniques for postpartum endometritis in dairy cattle. , 2008, Theriogenology.

[6]  D. Bonett Sample size requirements for estimating intraclass correlations with desired precision , 2002, Statistics in medicine.

[7]  F. Martínez-Pastor,et al.  DNA status on thawed semen from fighting bull: a comparison between the SCD and the SCSA tests. , 2009, Reproduction in domestic animals = Zuchthygiene.

[9]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[10]  K. Wynne-Edwards,et al.  Hematological changes associated with egg production: estrogen dependence and repeatability , 2008, Journal of Experimental Biology.

[11]  M. Barberán,et al.  Evaluation of the diagnostic accuracy of the modified agglutination test (MAT) and an indirect ELISA for the detection of serum antibodies against Toxoplasma gondii in sheep through Bayesian approaches. , 2007, Veterinary parasitology.

[12]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[13]  C. Waldner,et al.  Agreement between Three Serological Tests for Neospora Caninum in Beef Cattle , 2004, Journal of veterinary diagnostic investigation : official publication of the American Association of Veterinary Laboratory Diagnosticians, Inc.

[14]  L. Rajmil Health measurement scales. A practical guide to their development and use, 3rd ed , 2005 .

[15]  J M Bland,et al.  Statistical methods for assessing agreement between two methods of clinical measurement , 1986 .

[16]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[17]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[18]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[19]  S. Walter,et al.  Sample size and optimal designs for reliability studies. , 1998, Statistics in medicine.

[20]  A. Van Soom,et al.  Motility assessment of porcine spermatozoa: a comparison of methods. , 2004, Reproduction in domestic animals = Zuchthygiene.

[21]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[22]  M. Banerjee,et al.  Beyond kappa: A review of interrater agreement measures , 1999 .

[23]  C. Robert-Granié,et al.  Genetic analysis of male and female fertility after artificial insemination in sheep: comparison of single-trait and joint models. , 2007, Journal of dairy science.

[24]  J. Cuervo-Arango,et al.  Repeatability of preovulatory follicular diameter and uterine edema pattern in two consecutive cycles in the mare and how they are influenced by ovulation inductors. , 2008, Theriogenology.

[25]  David Machin,et al.  Sample Size Tables for Clinical Studies , 1997 .

[26]  H. Lehn-Jensen,et al.  Implementation of flow cytometry for quality control in four Danish bull studs. , 2005, Animal reproduction science.

[27]  G M Raab,et al.  Design and Analysis of Reliability Studies—the Statistical Evaluation of Measurement Errors , 1991 .

[28]  D. J. Ambrose,et al.  Evaluation of early conception factor lateral flow test to determine nonpregnancy in dairy cattle. , 2007, The Canadian veterinary journal = La revue veterinaire canadienne.

[29]  D. Streiner,et al.  Health Measurement Scales: A practical guide to thier development and use , 1989 .

[30]  P. Fricke,et al.  Accuracy of a pregnancy-associated glycoprotein ELISA to determine pregnancy status of lactating dairy cows twenty-seven days after timed artificial insemination. , 2007, Journal of dairy science.