Interrater and Intrarater Reliability in the Diagnosis and Staging of Endometriosis

OBJECTIVE: To estimate the interrater and intrarater reliability of endometriosis diagnosis and severity of disease among gynecologic surgeons viewing operative digital images. METHODS: The study population comprised a random sample (n=148 [36%]) of women who participated in the Endometriosis: Natural History, Diagnosis and Outcomes study. Four academic expert and four local, specialized expert surgeons reviewed the images, diagnosed the presence or absence of endometriosis for each woman, and rated severity using the revised American Society for Reproductive Medicine (ASRM) criteria. Interrater-level and intrarater-level agreement were calculated for both endometriosis diagnosis and staging. RESULTS: The interrater reliability for endometriosis diagnosis among the eight surgeons was substantial: Fleiss &kgr;=0.69 (95% confidence interval [CI] 0.64–0.74). Surgeons agreed on revised ASRM endometriosis staging criteria after experienced assessment in a majority of cases (mean 61%, range 52–75%) with moderate interrater reliability: Fleiss &kgr;=0.44 (95% CI 0.41–0.47). The intrarater reliability for experienced assessment compared with computer-assisted revised ASRM staging was almost perfect (mean weighted &kgr;=0.95, range 0.89–0.99). CONCLUSION: Substantial reliability was found for revised ASRM endometriosis diagnosis, whereas moderate reliability was observed for staging. Almost perfect reliability was observed for surgeons' rating of disease severity compared with computerized-assisted, checklist-based staging. Findings suggest that reliability in endometriosis diagnosis is not greatly altered by location or composition of surgeons, supporting the conduct of multisite studies or compilation of endometriosis data across clinical centers. Although surgeons appear to be skilled at assessing endometriosis stage intuitively, how staging of disease burden correlates with clinical outcomes remains to be developed. LEVEL OF EVIDENCE: II

[1]  M. Coccia,et al.  Ultrasonographic staging: a new staging system for deep endometriosis , 2011, Annals of the New York Academy of Sciences.

[2]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[3]  P. Oppelt,et al.  Comparison of revised American Fertility Society and ENZIAN staging: a critical evaluation of classifications of endometriosis on the basis of our patient population. , 2011, Fertility and sterility.

[4]  J. Rock The revised American Fertility Society classification of endometriosis: reproducibility of scoring. ZOLADEX Endometriosis Study Group. , 1995, Fertility and sterility.

[5]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[6]  M. T. ter Kuile,et al.  Intraobserver and interobserver reliability of videotaped laparoscopy evaluations for endometriosis and adhesions. , 2007, Fertility and sterility.

[7]  P. Wülfing,et al.  Interobserver variability in the diagnosis of minimal and mild endometriosis. , 2005, European journal of obstetrics, gynecology, and reproductive biology.

[8]  C. Farquhar Extracts from the "clinical evidence". Endometriosis. , 2000, BMJ.

[9]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[10]  P. Vercellini,et al.  Non-invasive diagnosis of endometriosis: the goal or own goal? , 2010, Human reproduction.

[11]  G. Adamson,et al.  Endometriosis fertility index: the new, validated endometriosis staging system. , 2010, Fertility and sterility.

[12]  Robert Greb,et al.  ESHRE guideline for the diagnosis and treatment of endometriosis. , 2005, Human reproduction.

[13]  M. Canis,et al.  6 Classification of endometriosis , 1993 .

[14]  G. Adamson,et al.  Endometriosis classification: an update , 2011, Current opinion in obstetrics & gynecology.

[15]  R. Barbieri,et al.  The reproducibility of the revised American Fertility Society classification of endometriosis. , 1993, Fertility and sterility.

[16]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[17]  C. Davis,et al.  Gynaecological laparoscopy: ‘see and treat’ should be the gold standard , 2008, Current opinion in obstetrics & gynecology.

[18]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[19]  Classification of endometriosis. The American Fertility Society. , 1979, Fertility and sterility.

[20]  L. Lu [Endometriosis and infertility]. , 1987, Zhonghua fu chan ke za zhi.

[21]  G. Maislin,et al.  Intra-observer and inter-observer variability in scoring laparoscopic diagnosis of pelvic adhesions. , 1995, Human reproduction.

[22]  J. Rock,et al.  The current staging system for endometriosis: does it help? , 2003, Obstetrics and gynecology clinics of North America.

[23]  R. Kistner,et al.  Revised American Fertility Society classification of endometriosis: 1985. , 1985, Fertility and sterility.

[24]  I. Fraser Recognising, understanding and managing endometriosis , 2008, Journal of human reproductive sciences.

[25]  M. Canis,et al.  Revised American Society for Reproductive Medicine classification of endometriosis: 1996. , 1997, Fertility and sterility.

[26]  Joseph Stanford,et al.  Incidence of endometriosis by study population and diagnostic method: the ENDO study. , 2011, Fertility and sterility.