Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.

[1]  Danuta Wasserman,et al.  The Representation of Suicide on the Internet: Implications for Clinicians , 2012, Journal of medical Internet research.

[2]  C. Bonroy,et al.  Do worsening scleroderma capillaroscopic patterns predict future severe organ involvement? a pilot study , 2012, Annals of the rheumatic diseases.

[3]  Daniel J. Taylor,et al.  Hopelessness mediates the relation between insomnia and suicidal ideation. , 2014, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[4]  Gonghuan Yang,et al.  Risk factors for suicide in China: a national case-control psychological autopsy study , 2002, The Lancet.

[5]  Taghi M. Khoshgoftaar,et al.  Using Twitter Content to Predict Psychopathy , 2012, 2012 11th International Conference on Machine Learning and Applications.

[6]  S. Stjernswärd,et al.  A Web-Based Supportive Intervention for Families Living With Depression: Content Analysis and Formative Evaluation , 2014, JMIR research protocols.

[7]  D. D. De Beurs,et al.  Applying Computer Adaptive Testing to Optimize Online Assessment of Suicidal Behavior: A Simulation Study , 2014, Journal of medical Internet research.

[8]  T. Gençöz,et al.  Associated Factors of Suicide Among University Students: Importance of Family Environment , 2006 .

[9]  M. Daigle,et al.  Predictive Validity of the Suicide Probability Scale in a Male Inmate Population , 2010 .

[10]  Yong-Ku Kim,et al.  Differences in cytokines between non-suicidal patients and suicidal patients in major depression , 2008, Progress in Neuro-Psychopharmacology and Biological Psychiatry.

[11]  Ryoichi Nagatomi,et al.  Factors associated with suicidal ideation in an elderly urban Japanese population: A community‐based, cross‐sectional study , 2005, Psychiatry and clinical neurosciences.

[12]  A. Fiske,et al.  Insomnia symptoms, nightmares, and suicidal ideation in older adults. , 2013, The journals of gerontology. Series B, Psychological sciences and social sciences.

[13]  I. Janszky,et al.  Maternal age at child birth, birth order, and suicide at a young age: a sibling comparison. , 2013, American journal of epidemiology.

[14]  E. Mościcki,et al.  Identification of suicide risk factors using epidemiologic studies. , 1997, The Psychiatric clinics of North America.

[15]  Ian H. Witten,et al.  Weka-A Machine Learning Workbench for Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[16]  He Li,et al.  Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog , 2013, Brain and Health Informatics.

[17]  M Filippi,et al.  Overcoming the Clinical–MR Imaging Paradox of Multiple Sclerosis: MR Imaging Data Assessed with a Random Forest Approach , 2011, American Journal of Neuroradiology.

[18]  K. Hawton,et al.  Life problems and physical illness as risk factors for suicide in older people: a descriptive and case-control study , 2006, Psychological Medicine.

[19]  P. Duberstein,et al.  The association of irritability and impulsivity with suicidal ideation among 15- to 20-year-old males. , 2004, Suicide & life-threatening behavior.

[20]  I. Theodossiou,et al.  The effects of low-pay and unemployment on psychological well-being: a logistic regression approach. , 1998, Journal of health economics.

[21]  Daniel Rueckert,et al.  Random forest-based similarity measures for multi-modal classification of Alzheimer's disease , 2013, NeuroImage.

[22]  Eric Horvitz,et al.  Predicting postpartum changes in emotion and behavior via social media , 2013, CHI.

[23]  M. McCarthy,et al.  Internet monitoring of suicide risk in the population. , 2010, Journal of affective disorders.

[24]  Kenton O'Hara,et al.  Social Impact , 2019, Encyclopedia of Food and Agricultural Ethics.

[25]  Eric Horvitz,et al.  Characterizing and predicting postpartum depression from shared facebook data , 2014, CSCW.

[26]  Evette Ludman,et al.  Designing Messaging to Engage Patients in an Online Suicide Prevention Intervention: Survey Results From Patients With Current Suicidal Ideation , 2014, Journal of medical Internet research.

[27]  Lu Zhang,et al.  Development and Application of a Chinese Webpage Suicide Information Mining System (Sims) , 2013, Journal of Medical Systems.

[28]  J. Leserman HIV disease progression: depression, stress, and possible mechanisms , 2003, Biological Psychiatry.

[29]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[30]  Redmond Wa,et al.  Characterizing and Predicting Postpartum Depression from Shared Facebook Data Munmun De Choudhury Scott Counts Eric Horvitz Aaron Hoff , 2014 .

[31]  Qijin Cheng,et al.  Responses to a self-presented suicide attempt in social media: a social network analysis. , 2013, Crisis.

[32]  Qijin Cheng,et al.  Opportunities and challenges of online data collection for suicide prevention , 2012, The Lancet.

[33]  P. Yip,et al.  Suicide rates in China from 2002 to 2011: an update , 2014, Social Psychiatry and Psychiatric Epidemiology.

[34]  Gareth Furber,et al.  A Comparison Between Phone-Based Psychotherapy With and Without Text Messaging Support In Between Sessions for Crisis Patients , 2014, Journal of medical Internet research.

[35]  K. Innos,et al.  Suicides among cancer patients in Estonia: a population-based study. , 2003, European journal of cancer.

[36]  Michael Chau,et al.  Temporal and computerized psycholinguistic analysis of the blog of a Chinese adolescent suicide. , 2014, Crisis.

[37]  Christopher G Kemp,et al.  Hyperlinked suicide: assessing the prominence and accessibility of suicide websites. , 2011, Crisis.

[38]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[39]  Naiji Lu,et al.  Connecting the invisible dots: reaching lesbian, gay, and bisexual adolescents and young adults at risk for suicide through online social networks. , 2009, Social science & medicine.

[40]  R. Kessler,et al.  Risk factors for the incidence and persistence of suicide-related outcomes: a 10-year follow-up study using the National Comorbidity Surveys. , 2008, Journal of affective disorders.

[41]  Michael D. Barnes,et al.  Tracking suicide risk factors through Twitter in the US. , 2014, Crisis.