Narrative Detection in Online Patient Communities

Although narratives on patient forums are a valuable source of medical information, their systematic detection and analysis has so far been limited to a single study. In this study, we examine whether psycholinguistic features or document embeddings can aid identification of narratives. We also investigate which features distinguish narratives from other social media posts. This study is the first to automatically identify the topics discussed in narratives on a patient forum. Our results show that for classifying narratives, character 3-grams outperform psycho-linguistic features and document embeddings. We found that narratives are characterized by the use of past tense, health-related words and first-person pronouns, whereas non-narrative text is associated with the future tense, emotional support words and second-person pronouns. Topic analysis of the patient narratives uncovered fourteen different medical topics, ranging from tumor surgery to side effects. Future work will use these methods to extract experiential patient knowledge from social media.

[1]  Derek Greene,et al.  An analysis of the coherence of descriptors in topic modeling , 2015, Expert Syst. Appl..

[2]  Graciela Gonzalez-Hernandez,et al.  Utilizing social media data for pharmacovigilance: A review , 2015, J. Biomed. Informatics.

[3]  A. Boonstra,et al.  Social media use in healthcare: A systematic review of effects on patients and on their relationship with healthcare professionals , 2016, BMC Health Services Research.

[4]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  Pam Carter,et al.  Mobilising the experiential knowledge of clinicians, patients and carers for applied health-care research , 2013 .

[7]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.

[8]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[9]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[10]  Abeed Sarker,et al.  A customizable pipeline for social media text normalization , 2017, Social Network Analysis and Mining.

[11]  Eugene Agichtein,et al.  Did You Really Just Have a Heart Attack?: Towards Robust Detection of Personal Health Mentions in Social Media , 2018, WWW.

[12]  Ségolène Aymé,et al.  Empowerment of patients: lessons from the rare diseases community , 2008, The Lancet.

[13]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[14]  Rachel E. Ginn,et al.  Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter , 2016, Drug Safety.

[15]  C. V. van Uden-Kraan,et al.  Participation in online patient support groups endorses patients' empowerment. , 2009, Patient education and counseling.

[16]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[17]  Erwin R. Seydel,et al.  Breast Cancer , Arthritis , or Fibromyalgia Empowering Processes and Outcomes of Participation in Online Support Groups for Patients With , 2012 .