Capturing Expert Arguments from Medical Adjudication Discussions in a Machine-readable Format

Group-based discussion among human graders can be a useful tool to capture sources of disagreement in ambiguous classification tasks and to adjudicate any resolvable disagreements. Existing workflows for panel-based adjudication, however, capture graders’ arguments and rationales in a free-form, unstructured format, limiting the potential for automatic analysis of the discussion contents. We designed and implemented a structured adjudication system that collects graders’ arguments in a machine-readable format without limiting graders’ abilities to provide free-form justifications for their classification decisions. Our system enables graders to cite instructions from a set of labeling guidelines, specified in the form of discrete classification rules and conditions that need to be met in order for each rule to be applicable. In the present work, we outline the process of designing and implementing this adjudication system, and report preliminary findings from deploying our system in the context of medical time series analysis for sleep stage classification.

[1]  Justin Dauwels,et al.  Interictal epileptiform discharge characteristics underlying expert interrater agreement , 2017, Clinical Neurophysiology.

[2]  Peter McBurney,et al.  Chapter 3 DECISION SUPPORT FOR PRACTICAL REASONING A Theoretical and Computational Perspective , 2008 .

[3]  Jeryl L. Mumpower,et al.  Expert Judgement and Expert Disagreement , 1996 .

[4]  Jonathan Krause,et al.  Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy , 2017, Ophthalmology.

[5]  Guillermo Ricardo Simari,et al.  Towards an argument interchange format , 2006, The Knowledge Engineering Review.

[6]  Miriam Solomon,et al.  Groupthink versus The Wisdom of Crowds: The Social Epistemology of Deliberation and Dissent , 2006 .

[7]  Robin Cohen,et al.  Analyzing the Structure of Argumentative Discourse , 1987, CL.

[8]  A. Chesson,et al.  The American Academy of Sleep Medicine (AASM) Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications , 2007 .

[9]  Alfred Moore,et al.  Should We Aim for Consensus? , 2010, Episteme.

[10]  Andrew Lim,et al.  Expert Disagreement in Sequential Labeling: A Case Study on Adjudication in Medical Time Series Analysis , 2018, SAD/CrowdBias@HCOMP.

[11]  M. Schaekermann Resolvable vs. Irresolvable Ambiguity: A New Hybrid Framework for Dealing with Uncertain Ground Truth , 2016 .

[12]  A. Chesson,et al.  The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications , 2007 .

[13]  Elizabeth Sklar,et al.  ArgTrust: decision making with information from sources of varying trustworthiness , 2013, AAMAS.

[14]  Reedchris,et al.  Towards an argument interchange format , 2006 .

[15]  Geoffrey E. Hinton,et al.  Who Said What: Modeling Individual Labelers Improves Classification , 2017, AAAI.

[16]  Chris Reed,et al.  Argumentation Schemes , 2008 .

[17]  R. Rosenberg,et al.  The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. , 2013, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[18]  Patrick Saint-Dizier,et al.  Towards Argument Mining from Dialogue , 2014, COMMA.

[19]  Andrew Y. Ng,et al.  Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks , 2017, ArXiv.

[20]  Chris Reed,et al.  Argument Mining Using Argumentation Scheme Structures , 2016, COMMA.

[21]  Henry Prakken,et al.  Towards a Formal Account of Reasoning about Evidence: Argumentation Schemes and Generalisations , 2003, Artificial Intelligence and Law.

[22]  Chris Reed,et al.  Combining Argument Mining Techniques , 2015, ArgMining@HLT-NAACL.

[23]  Chris Reed,et al.  Towards a Formal and Implemented Model of Argumentation Schemes in Agent Communication , 2004, Autonomous Agents and Multi-Agent Systems.

[24]  Thomas Penzel,et al.  Inter-scorer reliability between sleep centers can teach us what to improve in the scoring rules. , 2013, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[25]  Miriam Solomon,et al.  The Social Epistemology of NIH ConsensusConferences , 2007 .

[26]  Luciana Garbayo,et al.  Epistemic Considerations on Expert Disagreement, Normative Justification, and Inconsistency Regarding Multi-criteria Decision Making , 2014, Constraint Programming and Decision Making.