Stigma Annotation Scheme and Stigmatized Language Detection in Health-Care Discussions on Social Media

Much research has been done within the social sciences on the interpretation and influence of stigma on human behaviour and health, which result in out-of-group exclusion, distancing, cognitive separation, status loss, discrimination, in-group pressure, and often lead to disengagement, non-adherence to treatment plan, and prescriptions by the doctor. However, little work has been conducted on computational identification of stigma in general and in social media discourse in particular. In this paper, we develop the annotation scheme and improve the annotation process for stigma identification, which can be applied to other health-care domains. The data from pro-vaccination and anti-vaccination discussion groups are annotated by trained annotators who have professional background in social science and health-care studies, therefore the group can be considered experts on the subject in comparison to non-expert crowd. Amazon MTurk annotators is another group of annotator with no knowledge on their education background, they are initially treated as non-expert crowd on the subject matter of stigma. We analyze the annotations with visualisation techniques, features from LIWC (Linguistic Inquiry and Word Count) list and make prediction based on bi-grams with traditional and deep learning models. Data augmentation method and application of CNN show high performance accuracy in comparison to other models. Success of the rigorous annotation process on identifying stigma is reconfirmed by achieving high prediction rate with CNN.

[1]  Scott Counts,et al.  Understanding Anti-Vaccination Attitudes in Social Media , 2016, ICWSM.

[2]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[3]  John Torous,et al.  #Schizophrenia: Use and misuse on Twitter , 2015, Schizophrenia Research.

[4]  Chris J. Vargo,et al.  Geographic and demographic correlates of autism-related anti-vaccine beliefs on Twitter, 2009-15. , 2017, Social science & medicine.

[5]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[6]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[7]  Patty Kostkova,et al.  Who is Spreading Rumours about Vaccines?: Influential User Impact Modelling in Social Networks , 2017, DH.

[8]  Dongdong Jiao,et al.  Detecting depression stigma on social media: A linguistic analysis. , 2018, Journal of affective disorders.

[9]  E. Goffman Stigma; Notes On The Management Of Spoiled Identity , 1964 .

[10]  Carolyn Penstein Rosé,et al.  Conversational Metaphors in Use: Exploring the Contrast between Technical and Everyday Notions of Metaphor , 2014 .

[11]  Jingcheng Du,et al.  Leveraging machine learning-based approaches to assess human papillomavirus vaccination sentiment trends with Twitter data , 2017, BMC Medical Informatics and Decision Making.

[12]  Philip M. Massey,et al.  Applying Multiple Data Collection Tools to Quantify Human Papillomavirus Vaccine Communication on Twitter , 2016, Journal of medical Internet research.

[13]  Weiguo Fan,et al.  A new image classification method using CNN transfer learning and web data augmentation , 2018, Expert Syst. Appl..

[14]  J Tudor-Hart,et al.  On the nature of prejudice. , 1961, The Eugenics review.

[15]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[16]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[17]  Bruce G. Link,et al.  Stigma, prejudice, discrimination and health. , 2008, Social science & medicine.

[18]  K. Kawakami,et al.  Stereotyping, prejudice, and discrimination , 2014 .

[19]  R. Gibbs,et al.  MIP: A method for identifying metaphorically used words in discourse , 2007 .

[20]  Simone Teufel,et al.  Metaphor Corpus Annotated for Source - Target Domain Mappings , 2010, LREC.

[21]  Julio César Hernández Castro,et al.  Detecting discussion communities on vaccination in twitter , 2017, Future Gener. Comput. Syst..

[22]  Natalie C. Boero What's Wrong with Fat? , 2014 .

[23]  Samarth Swarup,et al.  Semantic network analysis of vaccine sentiment in online social media. , 2017, Vaccine.

[24]  Antonio Scala,et al.  Polarization of the Vaccination Debate on Facebook , 2018, Vaccine.

[25]  Nicola J. Reavley,et al.  Use of Twitter to monitor attitudes toward depression and schizophrenia: an exploratory study , 2014, PeerJ.

[26]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[27]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[28]  Kate Faasse,et al.  A comparison of language use in pro- and anti-vaccination comments in response to a high profile Facebook post. , 2016, Vaccine.

[29]  Mary McCarthy,et al.  Weight stigma and narrative resistance evident in online discussions of obesity , 2014, Appetite.

[30]  Irwin Katz,et al.  Stigma: A Social Psychological Analysis , 1981 .

[31]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .