Tongue Image Database Construction Based on the Expert Opinions: Assessment for Individual Agreement and Methods for Expert Selection

This study aims at introducing a method for individual agreement evaluation to identify the discordant raters from the experts' group. We exclude those experts and decide the best experts selection method, so as to improve the reliability of the constructed tongue image database based on experts' opinions. Fifty experienced experts from the TCM diagnostic field all over China were invited to give ratings for 300 randomly selected tongue images. Gwet's AC1 (first-order agreement coefficient) was used to calculate the interrater and intrarater agreement. The optimization of the interrater agreement and the disagreement score were put forward to evaluate the external consistency for individual expert. The proposed method could successfully optimize the interrater agreement. By comparing three experts' selection methods, the interrater agreement was, respectively, increased from 0.53 [0.32-0.75] for original one to 0.64 [0.39-0.80] using method A (inclusion of experts whose intrarater agreement>0.6), 0.69 [0.63-0.81] using method B (inclusion of experts whose disagreement score=“0”), and 0.76 [0.67-0.83] using method C (inclusion of experts whose intrarater agreement>0.6& disagreement score=“0”). In this study, we provide an estimate of external consistency for individual expert, and the comprehensive consideration of both the internal consistency and the external consistency for each expert would be superior to either one in the tongue image construction based on expert opinions.

[1]  John Y. Chiang,et al.  The Study on the Agreement between Automatic Tongue Diagnosis System and Traditional Chinese Medicine Practitioners , 2012, Evidence-based complementary and alternative medicine : eCAM.

[2]  A Hasman,et al.  An approach to knowledge base construction based on expert opinions. , 2004, Methods of information in medicine.

[3]  E. Oliva,et al.  Interobserver Agreement for Assessing Invasion in Stage 1A Vulvar Squamous Cell Carcinoma , 2013, The American journal of surgical pathology.

[4]  P. Tekkis,et al.  Interobserver agreement of radiologists assessing the response of rectal cancers to preoperative chemoradiation using the MRI tumour regression grading (mrTRG). , 2016, Clinical radiology.

[5]  Huiman X Barnhart,et al.  Choice of agreement indices for assessing and improving measurement reproducibility in a core laboratory setting , 2016, Statistical methods in medical research.

[6]  Poul Jennum,et al.  Inter-expert and intra-expert reliability in sleep spindle scoring , 2015, Clinical Neurophysiology.

[7]  F. Blyth,et al.  Assessing risk of bias in prevalence studies: modification of an existing tool and evidence of interrater agreement. , 2012, Journal of clinical epidemiology.

[8]  Reinhard Saller,et al.  Zur Reliabilität der Beschreibung morphologischer Merkmale in der traditionellen chinesischen Zungendiagnostik , 2009, Complementary Medicine Research.

[9]  Aya A. Mitani,et al.  Assessing the influence of rater and subject characteristics on measures of agreement for ordinal ratings , 2017, Statistics in medicine.

[10]  D. Cobbin,et al.  Traditional Chinese medicine tongue inspection: an examination of the inter- and intrapractitioner reliability for specific tongue characteristics. , 2008, Journal of alternative and complementary medicine.

[11]  Klaus Linde,et al.  [On the reliability of the description of morphological characteristics in traditional Chinese tongue diagnostics]. , 2009, Forschende Komplementarmedizin.

[12]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[13]  Zhen Qi,et al.  The Classification of Tongue Colors with Standardized Acquisition and ICC Profile Correction in Traditional Chinese Medicine , 2016, BioMed research international.

[14]  Comparison of Concordance Correlation Coefficient and Coefficient of Individual Agreement in Assessing Agreement , 2007, Journal of biopharmaceutical statistics.

[15]  K. Gwet,et al.  A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples , 2013, BMC Medical Research Methodology.

[16]  A. Feinstein,et al.  High agreement but low kappa: II. Resolving the paradoxes. , 1990, Journal of clinical epidemiology.

[17]  Tae-Yong Park,et al.  Interobserver Reliability of Tongue Diagnosis Using Traditional Korean Medicine for Stroke Patients , 2012, Evidence-based complementary and alternative medicine : eCAM.

[18]  G. Howard,et al.  Agreement between Medicare pharmacy claims, self‐report, and medication inventory for assessing lipid‐lowering medication use , 2016, Pharmacoepidemiology and drug safety.

[19]  A. Hrõbjartsson,et al.  Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. , 2011, Journal of clinical epidemiology.

[20]  I Ruddat,et al.  Statistical tools to improve assessing agreement between several observers. , 2014, Animal : an international journal of animal bioscience.

[21]  P. Slosar,et al.  Interobserver agreement using computed tomography to assess radiographic fusion criteria with a unique titanium interbody device. , 2015, American journal of orthopedics.