Construct validity of six sentiment analysis methods in the text of encounter notes of patients with critical illness

Sentiment analysis may offer insights into patient outcomes through the subjective expressions made by clinicians in the text of encounter notes. We analyzed the predictive, concurrent, convergent, and content validity of six sentiment methods in a sample of 793,725 multidisciplinary clinical notes among 41,283 hospitalizations associated with an intensive care unit stay. None of these approaches improved early prediction of in-hospital mortality using logistic regression models, but did improve both discrimination and calibration when using random forests. Additionally, positive sentiment measured by the CoreNLP (OR 0.04, 95% CI 0.002-0.55), Pattern (OR 0.09, 95% CI 0.04-0.17), sentimentr (OR 0.37, 95% CI 0.25-0.63), and Opinion (OR 0.25, 95% CI 0.07-0.89) methods were inversely associated with death on the concurrent day after adjustment for demographic characteristics and illness severity. Median daily lexical coverage ranged from 5.4% to 20.1%. While sentiment between all methods was positively correlated, their agreement was weak. Sentiment analysis holds promise for clinical applications but will require a novel domain-specific method applicable to clinical text.

[1]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[2]  Mohammed Saeed,et al.  Risk Stratification of ICU Patients Using Topic Models Inferred from Unstructured Progress Notes , 2012, AMIA.

[3]  Isaac S. Kohane,et al.  Sentiment Measured in Hospital Discharge Notes Is Associated with Readmission and Mortality Risk: An Electronic Health Record Study , 2015, PloS one.

[4]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[5]  J. A. Mulder,et al.  Communication in critical care: family rounds in the intensive care unit. , 2010, American journal of critical care : an official publication, American Association of Critical-Care Nurses.

[6]  Carmine Zoccali,et al.  Statistical methods for the assessment of prognostic biomarkers (Part I): discrimination. , 2010, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[7]  Walter Daelemans,et al.  Pattern for Python , 2012, J. Mach. Learn. Res..

[8]  Soon Ae Chun,et al.  Twitter sentiment classification for measuring public health concerns , 2015, Social Network Analysis and Mining.

[9]  Molly Carnes,et al.  Physicians and Implicit Bias: How Doctors May Unwittingly Perpetuate Health Care Disparities , 2013, Journal of General Internal Medicine.

[10]  Clement J. McDonald,et al.  Research and applications: Combining structured and unstructured data to identify a cohort of ICU patients who received dialysis , 2014, J. Am. Medical Informatics Assoc..

[11]  Margarita Sordo,et al.  Hospital Readmission and Social Risk Factors Identified from Physician Notes , 2018, Health services research.

[12]  Yihan Deng,et al.  Sentiment analysis in medical settings: New opportunities and challenges , 2015, Artif. Intell. Medicine.

[13]  J. Vincent,et al.  Serial evaluation of the SOFA score to predict outcome in critically ill patients. , 2001, JAMA.

[14]  C. Steiner,et al.  Comorbidity measures for use with administrative data. , 1998, Medical care.

[15]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[16]  Jure Leskovec,et al.  Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora , 2016, EMNLP.

[17]  Karimollah Hajian-Tilaki,et al.  Sample size estimation in diagnostic test studies of biomedical informatics , 2014, J. Biomed. Informatics.

[18]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[19]  H. Krumholz Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. , 2014, Health affairs.

[20]  Shamkant B. Navathe,et al.  Identifying Patients with Depression Using Free-text Clinical Documents , 2015, MedInfo.

[21]  D. Campbell,et al.  Convergent and discriminant validation by the multitrait-multimethod matrix. , 1959, Psychological bulletin.

[22]  Holger J Schünemann,et al.  Mortality predictions in the intensive care unit: Comparing physicians with scoring systems* , 2006, Critical care medicine.

[23]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[24]  O. Bolarinwa Principles and methods of validity and reliability testing of questionnaires used in social and health science researches , 2015, The Nigerian postgraduate medical journal.

[25]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[26]  Jina Huh,et al.  Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text , 2015, Journal of medical Internet research.

[27]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[28]  K. Bretonnel Cohen,et al.  Sentiment Analysis of Suicide Notes: A Shared Task , 2012, Biomedical informatics insights.

[29]  Michael O Harhay,et al.  Natural Language Processing to Assess Documentation of Features of Critical Illness in Discharge Documents of Acute Respiratory Distress Syndrome Survivors. , 2016, Annals of the American Thoracic Society.

[30]  L. Cronbach,et al.  Construct validity in psychological tests. , 1955, Psychological bulletin.

[31]  Michael O Harhay,et al.  Discriminative Accuracy of Physician and Nurse Predictions for Survival and Functional Outcomes 6 Months After an ICU Admission , 2017, JAMA.

[32]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[33]  P. Austin,et al.  A Modification of the Elixhauser Comorbidity Measures Into a Point System for Hospital Death Using Administrative Data , 2009, Medical care.

[34]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[35]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[36]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[37]  Sophia Ananiadou,et al.  Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts , 2016, J. Biomed. Informatics.

[38]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[39]  K. Larsen,et al.  Interpreting Parameters in the Logistic Regression Model with Random Effects , 2000, Biometrics.

[40]  Marco Guerini,et al.  SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis , 2015, IEEE Transactions on Affective Computing.

[41]  Ben J. Marafino,et al.  Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes , 2015, J. Biomed. Informatics.

[42]  J. Vincent,et al.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , 1996, Intensive Care Medicine.

[43]  Joel A Dubin,et al.  Sentiment in nursing notes as an indicator of out-of-hospital mortality in intensive care patients , 2018, PloS one.