Detecting clinically related content in online patient posts

Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.

[1]  David W. McDonald,et al.  Text Classification to Weave Medical Advice with Patient Experiences , 2012, AMIA.

[2]  Robert E. Kraut,et al.  Support matching and satisfaction in an online breast cancer support community , 2014, CHI.

[3]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[4]  Susan McRoy,et al.  Toward automated classification of consumers’ cancer-related questions with a new taxonomy of expected answer types , 2016, Health Informatics J..

[5]  Jina Huh,et al.  Weaving clinical expertise in online health communities , 2014, CHI.

[6]  Marcel Salathé,et al.  Discovering health-related knowledge in social media using ensembles of heterogeneous features , 2013, CIKM.

[7]  Luis Alfonso Ureña López,et al.  Experiments with SVM to classify opinions in different domains , 2011, Expert Syst. Appl..

[8]  John Yen,et al.  Temporal Causality of Social Support in an Online Community for Cancer Survivors , 2015, SBP.

[9]  William Hart-Davidson,et al.  Tracing and responding to foodborne illness , 2012, SIGDOC '12.

[10]  Natalia Grabar,et al.  Predicting Medical Roles in Online Health Fora , 2014, SLSP.

[11]  Jina Huh,et al.  Tackling dilemmas in supporting 'the whole person' in online patient communities , 2012, CHI.

[12]  P. Resnick,et al.  Building Successful Online Communities: Evidence-Based Social Design , 2012 .

[13]  Ryen W. White,et al.  Seeking and sharing health information online: comparing search engines and social media , 2014, CHI.

[14]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[15]  Yuan-Fang Wang,et al.  The use of bigrams to enhance text categorization , 2002, Inf. Process. Manag..

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[18]  Sung-Hee Kim,et al.  VisOHC: Designing Visual Analytics for Online Health Communities , 2016, IEEE Transactions on Visualization and Computer Graphics.

[19]  Jina Huh,et al.  Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text , 2015, Journal of medical Internet research.

[20]  Tat-Seng Chua,et al.  From Tweets to Wellness: Wellness Event Detection from Twitter Streams , 2016, AAAI.

[21]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[22]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[23]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[24]  Jina Huh,et al.  Text classification for assisting moderators in online health communities , 2013, J. Biomed. Informatics.

[25]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[26]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[27]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[28]  Taridzo Chomutare,et al.  Text Classification to Automatically Identify Online Patients Vulnerable to Depression , 2014, MindCare.

[29]  David W. McDonald,et al.  Patient Moderator Interaction in Online Health Communities , 2013, AMIA.

[30]  Ming Yang,et al.  Filtering big data from social media - Building an early warning system for adverse drug reactions , 2015, J. Biomed. Informatics.

[31]  Fernando Martín-Sánchez,et al.  Health outcomes and related effects of using social media in chronic disease management: A literature review and analysis of affordances , 2013, J. Biomed. Informatics.

[32]  David G. Stork,et al.  Pattern Classification , 1973 .

[33]  Jina Huh,et al.  Clinical Questions in Online Health Communities: The Case of "See your doctor" Threads , 2015, CSCW.

[34]  David W. McDonald,et al.  Evaluating health interest profiles extracted from patient-generated data , 2014, AMIA.

[35]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[36]  Wolfgang Nejdl,et al.  How valuable is medical social media data? Content analysis of the medical web , 2009, Inf. Sci..