Identifying Consumer Health Terms of Side Effects in Twitter Posts

Prevalence of social media has driven a growing number of health related applications with the information shared by online users. It is well known that a gap exists between healthcare professionals and laypeople in expressing the same health concepts. Filling this gap is particularly important for health related applications using social media data. A data-driven, attributional similarity-based method was developed to identify Twitter terms related to side effect concepts. For the 10 most common side effect (symptom) concepts, our method was able to identify a total of 333 Twitter terms, among which only 90 are mapped to those in the consumer health vocabulary (CHV). The identified Twitter terms are specific to Twitter data, indicating a need to expand the existing CHV, and many of them seem to have less ambiguity in word senses than those in CHV.