Building an automated SOAP classifier for emergency department reports

Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks.

[1]  Claire Grover,et al.  Sequence modelling for sentence classification in a legal summarisation system , 2005, SAC '05.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Robert E. Mercer,et al.  The Frequency of Hedging Cues in Citation Contexts in Scientific Writing , 2004, Canadian Conference on AI.

[4]  F. Yates Contingency Tables Involving Small Numbers and the χ2 Test , 1934 .

[5]  Wendy W. Chapman,et al.  Research Paper: Generating a Reliable Reference Standard Set for Syndromic Case Classification , 2005, J. Am. Medical Informatics Assoc..

[6]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[7]  Peter J. Haug,et al.  Diagnosing community-acquired pneumonia with a Bayesian network , 1998, AMIA.

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Yuval Marom,et al.  Experiments with Sentence Classification , 2006, ALTA.

[10]  Suzanne Bakken,et al.  Exploring the Ability of Natural Language Processing to Extract Data From Nursing Narratives , 2009, Computers, informatics, nursing : CIN.

[11]  Hong Yu,et al.  Beyond Information Retrieval - Medical Question Answering , 2006, AMIA.

[12]  Peter J. Haug,et al.  Improving the Sensitivity of the Problem List in an Intensive Care Unit by Using Natural Language Processing , 2006, AMIA.

[13]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.

[14]  Son Doan,et al.  An Empirical Study of Sections in Classifying Disease Outbreak Reports , 2019, Web-Based Applications in Healthcare and Biomedicine.

[15]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[16]  Sören Huwendiek,et al.  The Step 2 Clinical Skills exam. , 2013, The New England journal of medicine.

[17]  W ChapmanWendy,et al.  Building an automated SOAP classifier for emergency department reports , 2012 .

[18]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[19]  Byron E. Bork,et al.  Medical Records, Medical Education, and Patient Care , 1975 .

[20]  Xiaoyan Wang,et al.  Selecting information in electronic health records for knowledge acquisition , 2010, J. Biomed. Informatics.

[21]  Barbara Kozier,et al.  Techniques in Clinical Nursing , 1989 .

[22]  John K. Vries,et al.  The medical archival system: An information retrieval system based on distributed parallel processing , 1991, Inf. Process. Manag..

[23]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[24]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[25]  M. Morreale,et al.  The OTA's Guide to Writing Soap Notes , 2002 .

[26]  Randolph A. Miller,et al.  Research Paper: Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents , 2009, J. Am. Medical Informatics Assoc..

[27]  Plaban Kumar Bhowmick,et al.  Classifying Emotion in News Sentences: When Machine Classification Meets Human Classification , 2010 .

[28]  Danielle L. Mowery,et al.  Distinguishing Historical from Current Problems in Clinical Reports – Which Textual Features Help? , 2009, BioNLP@HLT-NAACL.

[29]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[30]  Peter D. Stetson,et al.  Content and Structure of Clinical Problem Lists: A Corpus Analysis , 2008, AMIA.

[31]  Carol Friedman,et al.  Automatic Summarization of Patient Discharge Summaries to Create Problem Lists using Medical Language Processing , 2004 .

[32]  J S Bhopal Simple SOAP system. , 1981, British medical journal.

[33]  Wendy W. Chapman,et al.  Anaphoric relations in the clinical narrative: corpus creation , 2011, J. Am. Medical Informatics Assoc..

[34]  L. Weed Medical records, medical education, and patient care;: The problem-oriented record as a basic tool , 1970 .

[35]  E. B. Steen,et al.  The Computer-Based Patient Record: An Essential Technology for Health Care , 1992, Annals of Internal Medicine.

[36]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[37]  Thomas H. Payne,et al.  Building an Automated Problem List Based on Natural Language Processing: Lessons Learned in the Early Phase of Development , 2008, AMIA.

[38]  Ian Witten,et al.  Data Mining , 2000 .

[39]  David Martínez,et al.  Automatic classification of sentences to support Evidence Based Medicine , 2011, BMC Bioinformatics.

[40]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[41]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[42]  Susannah Cameron,et al.  Learning to Write Case Notes Using the SOAP Format , 2002 .

[43]  Ricky K. Taira,et al.  Automatic Section Segmentation of Medical Reports , 2003, AMIA.

[44]  Peter J. Haug,et al.  Bmc Medical Informatics and Decision Making Automation of a Problem List Using Natural Language Processing , 2005 .

[45]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[46]  Dina Demner-Fushman,et al.  Automatic segmentation of clinical texts , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[47]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[48]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[49]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[50]  J Fowler,et al.  ASOP: a new method and tools for capturing a clinical encounter. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.