Classifying clinical notes with pain assessment using machine learning

AbstractPain is a significant public health problem, affecting millions of people in the USA. Evidence has highlighted that patients with chronic pain often suffer from deficits in pain care quality (PCQ) including pain assessment, treatment, and reassessment. Currently, there is no intelligent and reliable approach to identify PCQ indicators inelectronic health records (EHR). Hereby, we used unstructured text narratives in the EHR to derive pain assessment in clinical notes for patients with chronic pain. Our dataset includes patients with documented pain intensity rating ratings > = 4 and initial musculoskeletal diagnoses (MSD) captured by (ICD-9-CM codes) in fiscal year 2011 and a minimal 1 year of follow-up (follow-up period is 3-yr maximum); with complete data on key demographic variables. A total of 92 patients with 1058 notes was used. First, we manually annotated qualifiers and descriptors of pain assessment using the annotation schema that we previously developed. Second, we developed a reliable classifier for indicators of pain assessment in clinical note. Based on our annotation schema, we found variations in documenting the subclasses of pain assessment. In positive notes, providers mostly documented assessment of pain site (67%) and intensity of pain (57%), followed by persistence (32%). In only 27% of positive notes, did providers document a presumed etiology for the pain complaint or diagnosis. Documentation of patients’ reports of factors that aggravate pain was only present in 11% of positive notes. Random forest classifier achieved the best performance labeling clinical notes with pain assessment information, compared to other classifiers; 94, 95, 94, and 94% was observed in terms of accuracy, PPV, F1-score, and AUC, respectively. Despite the wide spectrum of research that utilizes machine learning in many clinical applications, none explored using these methods for pain assessment research. In addition, previous studies using large datasets to detect and analyze characteristics of patients with various types of pain have relied exclusively on billing and coded data as the main source of information. This study, in contrast, harnessed unstructured narrative text data from the EHR to detect pain assessment clinical notes. We developed a Random forest classifier to identify clinical notes with pain assessment information. Compared to other classifiers, ours achieved the best results in most of the reported metrics. Graphical abstractFramework for detecting pain assessment in clinical notes.

[1]  Remco R. Bouckaert,et al.  Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.

[2]  Duy Duc An Bui,et al.  Research and applications: Learning regular expressions for clinical text classification , 2014, J. Am. Medical Informatics Assoc..

[3]  Samah J Fodeh,et al.  Baseline Cluster Membership Demonstrates Positive Associations with First Occurrence of Multiple Gerontologic Outcomes Over 10 Years , 2015, Experimental aging research.

[4]  Timothy S. Carey,et al.  Accuracy of the Pain Numeric Rating Scale as a Screening Test in Primary Care , 2007, Journal of General Internal Medicine.

[5]  G. Lindegger,et al.  Health Care Guideline Assessment and Management of Chronic Pain , 2013 .

[6]  Sarah Jane Delany k-Nearest Neighbour Classifiers , 2007 .

[7]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[8]  Allison W Lee,et al.  Development and application of an electronic health record information extraction tool to assess quality of pain management in primary care , 2014, Translational behavioral medicine.

[9]  Samah Jamal Fodeh,et al.  Laplacian SVM Based Feature Selection Improves Medical Event Reports Classification , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[10]  Serguei V. S. Pakhomov,et al.  Technical Brief: Automatic Classification of Foot Examination Findings Using Clinical Notes and Machine Learning , 2008, J. Am. Medical Informatics Assoc..

[11]  Shuying Shen,et al.  A Prototype Tool Set to Support Machine-Assisted Annotation , 2012, BioNLP@HLT-NAACL.

[12]  A. Mechelli,et al.  Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: A critical review , 2012, Neuroscience & Biobehavioral Reviews.

[13]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[14]  Christopher G. Chute,et al.  Building and Evaluating Annotated Corpora for Medical NLP Systems , 2006, AMIA.

[15]  Harry Hemingway,et al.  Prognosis of undiagnosed chest pain: linked electronic health record cohort study , 2017, British Medical Journal.

[16]  L. Weed Medical records, patient care, and medical education , 1964, Irish journal of medical science.

[17]  Jack Mardekian,et al.  Use of electronic health records for early detection of high-cost, low back pain patients , 2015, Pain research & management.

[18]  S. Chung,et al.  No effect of recumbency duration on the occurrence of post-lumbar puncture headache with a 22G cutting needle , 2012, BMC Neurology.

[19]  Nikolas P. Galatsanos,et al.  A support vector machine approach for detection of microcalcifications , 2002, IEEE Transactions on Medical Imaging.

[20]  Liana Fraenkel,et al.  The musculoskeletal diagnosis cohort: examining pain and pain care among veterans , 2016, Pain.

[21]  G. Oster,et al.  Clinical characteristics and patterns of healthcare utilization in patients with painful neuropathic disorders in UK general practice: a retrospective cohort study , 2012, BMC Neurology.

[22]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[23]  Vivian Weerdesteyn,et al.  Does osteoporosis predispose falls? a study on obstacle avoidance and balance confidence , 2011, BMC musculoskeletal disorders.

[24]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[25]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[26]  Marimuthu Palaniswami,et al.  Support vector machines for automated gait classification , 2005, IEEE Transactions on Biomedical Engineering.

[27]  Clive Osmond,et al.  Back pain outcomes in primary care following a practice improvement intervention:- a prospective cohort study , 2011, BMC Musculoskeletal Disorders.

[28]  Lee S. Simon RELIEVING PAIN IN AMERICA: A BLUEPRINT FOR TRANSFORMING PREVENTION, CARE, EDUCATION, AND RESEARCH , 2012, Military medicine.

[29]  Michael Lynskey,et al.  Co-morbidity and utilization of medical services by pain patients receiving opioid medications: Data from an insurance claims database , 2009, PAIN®.

[30]  P. Buckley,et al.  Risks for possible and probable opioid misuse among recipients of chronic opioid therapy in commercial and medicaid insurance plans: The TROUP Study , 2012 .

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  J. Farrar,et al.  Core outcome measures for chronic pain clinical trials: IMMPACT recommendations , 2003, Pain.

[33]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[34]  Wensheng Zhang,et al.  Ranking with decision tree , 2008, Knowledge and Information Systems.

[35]  Ianita Zlateva,et al.  Stepped care model of pain management and quality of pain care in long-term opioid therapy. , 2016, Journal of rehabilitation research and development.

[36]  Michael E. Clark,et al.  Pain among veterans of Operations Enduring Freedom and Iraqi Freedom. , 2005, Pain medicine.

[37]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[38]  Dezon Finch,et al.  Using Statistical Text Mining to Identify Falls in VHA Ambulatory Care Data , 2012, AMIA.

[39]  Kenneth A. Ross Cache-Conscious Query Processing , 2018, Encyclopedia of Database Systems.

[40]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[41]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[42]  Jodie A. Trafton,et al.  Identifying Neck and Back Pain in Administrative Data: Defining the Right Cohort , 2012, Spine.

[43]  Susannah Cameron,et al.  Learning to Write Case Notes Using the SOAP Format , 2002 .

[44]  Ianita Zlateva,et al.  Using electronic health records data to identify patients with chronic pain in a primary care setting. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[45]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[46]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[47]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[48]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[49]  Wei Zheng,et al.  Classification of colonic tissues using near-infrared Raman spectroscopy and support vector machines. , 2008, International journal of oncology.

[50]  Robert D. Kerns,et al.  Behavioral and Psychopharmacologic Pain Management: Comprehensive pain assessment: the integration of biopsychosocial principles , 2010 .

[51]  Christopher G. Chute,et al.  Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition , 2008, LREC.

[52]  Louise Plaisance,et al.  Pain???Clinical Manual (2nd Edition) , 2000 .

[53]  Cynthia A Brandt,et al.  Pain among Veterans of Operations Enduring Freedom and Iraqi Freedom: do women and men differ? , 2009, Pain medicine.

[54]  安藤 寛,et al.  Cross-Validation , 1952, Encyclopedia of Machine Learning and Data Mining.

[55]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.

[56]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .