Finding People with Emotional Distress in Online Social Media: A Design Combining Machine Learning and Rule-Based Classification

Many people face problems of emotional distress. Early detection of high-risk individuals is the key to prevent suicidal behavior. There is increasing evidence that the Internet and social media provide clues of people’s emotional distress. In particular, some people leave messages showing emotional distress or even suicide notes on the Internet. Identifying emotionally distressed people and examining their posts on the Internet are important steps for health and social work professionals to provide assistance, but the process is very time-consuming and ineffective if conducted manually using standard search engines. Following the design science approach, we present the design of a system called KAREN, which identifies individuals who blog about their emotional distress in the Chinese language, using a combination of machine learning classification and rule-based classification with rules obtained from experts. A controlled experiment and a user study were conducted to evaluate system performance in searching and analyzing blogs written by people who might be emotionally distressed. The results show that the proposed system achieved better classification performance than the benchmark methods and that professionals perceived the system to be more useful and effective for identifying bloggers with emotional distress than benchmark approaches.

[1]  Elisabeth Lex,et al.  Crosslanguage blog mining and trend visualisation , 2009, WWW '09.

[2]  J. Henry,et al.  The positive and negative affect schedule (PANAS): construct validity, measurement properties and normative data in a large non-clinical sample. , 2004, The British journal of clinical psychology.

[3]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[4]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[5]  E. Diener,et al.  Leisure and Subjective Well-Being: A Model of Psychological Mechanisms as Mediating Factors , 2013, Journal of Happiness Studies.

[6]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[7]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[8]  C C Chen,et al.  Psychosocial and psychiatric risk factors for suicide. Case-control psychological autopsy study. , 2000, The British journal of psychiatry : the journal of mental science.

[9]  Garyfalia Ampanozi,et al.  Suicide announcement on Facebook. , 2011, Crisis.

[10]  Joshua M. Smyth,et al.  LINGUISTIC DIMENSIONS OF PSYCHOPATHOLOGY: A QUANTITATIVE ANALYSIS , 2008 .

[11]  Hsinchun Chen,et al.  MetaSpider: Meta-searching and categorization on the Web , 2001, J. Assoc. Inf. Sci. Technol..

[12]  J. Pennebaker,et al.  Word Use in the Poetry of Suicidal and Nonsuicidal Poets , 2001, Psychosomatic medicine.

[13]  Qun Liu,et al.  HHMM-based Chinese Lexical Analyzer ICTCLAS , 2003, SIGHAN.

[14]  Pero Subasic,et al.  Affect analysis of text using fuzzy semantic typing , 2001, IEEE Trans. Fuzzy Syst..

[15]  Daniel Dajun Zeng,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009, J. Assoc. Inf. Sci. Technol..

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  F J Snoek,et al.  Diabetes-related emotional distress in Dutch and U.S. diabetic patients: cross-cultural validity of the problem areas in diabetes scale. , 2000, Diabetes care.

[18]  Gustavo Turecki,et al.  Suicide and suicidal behaviour , 2016, The Lancet.

[19]  J. Pennebaker,et al.  Language use of depressed and depression-vulnerable college students , 2004 .

[20]  Dylan M. Jones,et al.  Refining the measurement of mood: The UWIST Mood Adjective Checklist , 1990 .

[21]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[22]  Olivia R. Liu Sheng,et al.  ServiceFinder: A method towards enhancing service portals , 2007, TOIS.

[23]  Heleen Riper,et al.  Positive psychology interventions: a meta-analysis of randomized controlled studies , 2013, BMC Public Health.

[24]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[25]  Matthew Hurst,et al.  Analyzing online discussion for marketing intelligence , 2005, WWW '05.

[26]  Cindy K. Chung,et al.  The development of the Chinese linguistic inquiry and word count dictionary. , 2012 .

[27]  Hsinchun Chen,et al.  Affect Analysis of Web Forums and Blogs Using Correlation Ensembles , 2008, IEEE Transactions on Knowledge and Data Engineering.

[28]  H. Friedman The Oxford Handbook of Health Psychology , 2013 .

[29]  Chung-Hsien Wu,et al.  Emotion recognition from text using semantic labels and separable mixture models , 2006, TALIP.

[30]  Kazunari Ishida Extracting Latent Weblog Communities-A Partitioning Algorithm for Bipartite Graphs - , 2005 .

[31]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[32]  Mitsuru Ishizuka,et al.  User study on AffectIM, an avatar-based Instant Messaging system employing rule-based affect sensing from text , 2008, Int. J. Hum. Comput. Stud..

[33]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[34]  C. Mazure,et al.  Gender differences in the effects of bereavement-related psychological distress in health outcomes , 1999, Psychological Medicine.

[35]  J. Pennebaker Putting stress into words: health, linguistic, and therapeutic implications. , 1993, Behaviour research and therapy.

[36]  J. Gruber,et al.  Narrating emotional events in schizophrenia. , 2008, Journal of abnormal psychology.

[37]  Wolfgang Reichl,et al.  A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[38]  Alan R. Hevner,et al.  POSITIONING AND PRESENTING DESIGN SCIENCE RESEARCH FOR MAXIMUM IMPACT 1 , 2013 .

[39]  S. Lyubomirsky,et al.  How Do Simple Positive Activities Increase Well-Being? , 2013 .

[40]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[41]  Gilad Mishne,et al.  Capturing Global Mood Levels using Blog Posts , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[42]  Mike Thelwall,et al.  Data mining emotion in social network communication: Gender differences in MySpace , 2010, J. Assoc. Inf. Sci. Technol..

[43]  P. Yip,et al.  Suicide Communication on Social Media and Its Psychological Mechanisms: An Examination of Chinese Microblog Users , 2015, International journal of environmental research and public health.

[44]  Michael Chau,et al.  Temporal and computerized psycholinguistic analysis of the blog of a Chinese adolescent suicide. , 2014, Crisis.

[45]  J. Kuhl,et al.  Being Someone: The Integrated Self as a Neuropsychological System , 2015 .

[46]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[47]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[48]  Thomas Niesler,et al.  A variable-length category-based n-gram language model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[49]  Huan Liu,et al.  Understanding Group Structures and Properties in Social Media , 2010, Link Mining.

[50]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[51]  Jane Downing,et al.  Qualitative Research on Adolescent Risk Using E-Mail: A Methodological Assessment , 2003 .

[52]  D. Sloan,et al.  It’s All About Me: Self-Focused Attention and Depressed Mood , 2005, Cognitive Therapy and Research.

[53]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[54]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[55]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[56]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[57]  J. Pennebaker,et al.  Do good stories produce good health?: Exploring words, language, and culture , 2006 .

[58]  Chi-chiu Lee,et al.  Prevalence of DSM-IV disorders in Chinese adolescents and the effects of an impairment criterion , 2008, European Child & Adolescent Psychiatry.

[59]  Ahmed Abbasi,et al.  Affect Intensity Analysis of Dark Web Forums , 2007, 2007 IEEE Intelligence and Security Informatics.

[60]  Chern Li Liew,et al.  Hunting Suicide Notes in Web 2.0 - Preliminary Findings , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[61]  Mitsuru Ishizuka,et al.  Affect Analysis Model: novel rule-based approach to affect sensing from text , 2010, Natural Language Engineering.

[62]  Farag Saad Baseline evaluation: an empirical study of the performance of machine learning algorithms in short snippet sentiment analysis , 2014, i-KNOW '14.

[63]  Jennifer Jie Xu,et al.  Business Intelligence in Blogs: Understanding Consumer Interactions and Communities , 2012, MIS Q..

[64]  R. Hart Positive psychology interventions , 2020 .

[65]  Mark Dredze,et al.  Measuring Post Traumatic Stress Disorder in Twitter , 2014, ICWSM.

[66]  Fred D. Davis Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology , 1989, MIS Q..

[67]  Darren Gergle,et al.  The language of emotion in short blog texts , 2008, CSCW.

[68]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[69]  Stjepan Oreski,et al.  Genetic algorithm-based heuristic for feature selection in credit risk assessment , 2014, Expert Syst. Appl..

[70]  Eric Horvitz,et al.  Predicting postpartum changes in emotion and behavior via social media , 2013, CHI.

[71]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[72]  David R. Williams,et al.  Twelve-month prevalence of and risk factors for suicide attempts in the World Health Organization World Mental Health Surveys. , 2010, The Journal of clinical psychiatry.