Finding warning markers: Leveraging natural language processing and machine learning technologies to detect risk of school violence

INTRODUCTION School violence has a far-reaching effect, impacting the entire school population including staff, students and their families. Among youth attending the most violent schools, studies have reported higher dropout rates, poor school attendance, and poor scholastic achievement. It was noted that the largest crime-prevention results occurred when youth at elevated risk were given an individualized prevention program. However, much work is needed to establish an effective approach to identify at-risk subjects. OBJECTIVE In our earlier research, we developed a risk assessment program to interview subjects, identify risk and protective factors, and evaluate risk for school violence. This study focused on developing natural language processing (NLP) and machine learning technologies to automate the risk assessment process. MATERIAL AND METHODS We prospectively recruited 131 students with or without behavioral concerns from 89 schools between 05/01/2015 and 04/30/2018. The subjects were interviewed with two risk assessment scales and a questionnaire, and their risk of violence were determined by pediatric psychiatrists based on clinical judgment. Using NLP technologies, different types of linguistic features were extracted from the interview content. Machine learning classifiers were then applied to predict risk of school violence for individual subjects. A two-stage feature selection was implemented to identify violence-related predictors. The performance was validated on the psychiatrist-generated reference standard of risk levels, where positive predictive value (PPV), sensitivity (SEN), negative predictive value (NPV), specificity (SPEC) and area under the ROC curve (AUC) were assessed. RESULTS Compared to subjects' sociodemographic information, use of linguistic features significantly improved classifiers' predictive performance (P < 0.01). The best-performing classifier with n-gram features achieved 86.5 %/86.5 %/85.7 %/85.7 %/94.0 % (PPV/SEN/NPV/SPEC/AUC) on the cross-validation set and 83.3 %/93.8 %/91.7 %/78.6 %/94.6 % (PPV/SEN/NPV/SPEC/AUC) on the test data. The feature selection process identified a set of predictors covering the discussion of subjects' thoughts, perspectives, behaviors, individual characteristics, peers and family dynamics, and protective factors. CONCLUSIONS By analyzing the content from subject interviews, the NLP and machine learning algorithms showed good capacity for detecting risk of school violence. The feature selection uncovered multiple warning markers that could deliver useful clinical insights to assist personalizing intervention. Consequently, the developed approach offered the promise of an accurate and scalable computerized screening service for preventing school violence.

[1]  Dewey G. Cornell,et al.  Student Reports of Peer Threats of Violence: Prevalence and Outcomes , 2012 .

[2]  C. Webster,et al.  The HCR-20 Violence Risk Assessment Scheme , 1999 .

[3]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[4]  Fred Schmidt,et al.  A Comparative Study of Adolescent Risk Assessment Instruments , 2008, Assessment.

[5]  Violence risk assessment: A quarter century of research. , 2001 .

[6]  Chris Perry,et al.  Machine Learning and Conflict Prediction: A Use Case , 2013 .

[7]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  D. Neubauer,et al.  Psychiatrists' accuracy in predicting violent behavior on an inpatient unit. , 1988, Hospital & community psychiatry.

[10]  D. Altman,et al.  Statistics Notes: Diagnostic tests 1: sensitivity and specificity , 1994 .

[11]  D. Altman,et al.  Statistics Notes: Diagnostic tests 2: predictive values , 1994, BMJ.

[12]  Tonia L Nicholls,et al.  Predictive Validity of Risk Assessments in Juvenile Offenders , 2014, Assessment.

[13]  Beth E. Molnar,et al.  Effects of neighborhood resources on aggressive and delinquent behaviors among urban youths. , 2008, American journal of public health.

[14]  D. Mossman,et al.  Brief Rating of Aggression by Children and Adolescents (BRACHA): a reliability study. , 2012, The journal of the American Academy of Psychiatry and the Law.

[15]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[16]  Edward J. Latessa,et al.  A Pilot Study on Developing a Standardized and Sensitive School Violence Risk Assessment with Manual Annotation , 2017, Psychiatric Quarterly.

[17]  Mark W. Lipsey,et al.  Risk Factors and Crime , 2012 .

[18]  George Hripcsak,et al.  Automated detection of adverse events using natural language processing of discharge summaries. , 2005, Journal of the American Medical Informatics Association : JAMIA.

[19]  M. DelBello,et al.  Automated Risk Assessment for School Violence: a Pilot Study , 2018, Psychiatric Quarterly.

[20]  D. Mossman,et al.  Brief Rating of Aggression by Children and Adolescents (BRACHA): Development of a Tool for Assessing Risk of Inpatients' Aggressive Behavior , 2012 .

[21]  Kerry B. Bernes,et al.  Conducting Adolescent Violence Risk Assessments: A Framework for School Counselors , 2007 .

[22]  Yizhao Ni,et al.  An end-to-end hybrid algorithm for automated medication discrepancy detection , 2015, BMC Medical Informatics and Decision Making.

[23]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[24]  P. Jensen,et al.  Stimulant-Responsive and Stimulant-Refractory Aggressive Behavior Among Children With ADHD , 2010, Pediatrics.

[25]  Randy Borum,et al.  What Can Be Done About School Shootings? , 2010 .

[26]  Louis-Philippe Morency,et al.  A Machine Learning Approach to Identifying the Thought Markers of Suicidal Subjects: A Prospective Multicenter Trial , 2017, Suicide & life-threatening behavior.

[27]  Judith W. Dexheimer,et al.  Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department , 2014, J. Am. Medical Informatics Assoc..

[28]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[29]  I. Solti,et al.  Leveraging Food and Drug Administration Adverse Event Reports for the Automated Monitoring of Electronic Health Records in a Pediatric Hospital , 2017, Biomedical informatics insights.

[30]  Douglas Mossman,et al.  Assessing predictions of violence: being accurate about accuracy. , 1994, Journal of consulting and clinical psychology.

[31]  Jay P. Singh,et al.  A comparative study of violence risk assessment tools: a systematic review and metaregression analysis of 68 studies involving 25,980 participants. , 2011, Clinical psychology review.

[32]  Louise Deléger,et al.  Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre-screening for pediatric oncology patients , 2015, BMC Medical Informatics and Decision Making.

[33]  A. Leenaars,et al.  Suicide Note Classification Using Natural Language Processing: A Content Analysis , 2010, Biomedical informatics insights.

[34]  R. Horn,et al.  The predictive validity of the Structured Assessment of Violence Risk in Youth in secondary educational settings. , 2011, Psychological assessment.

[35]  Louis-Philippe Morency,et al.  Adolescent Suicidal Risk Assessment in Clinician-Patient Interaction , 2017, IEEE Transactions on Affective Computing.

[36]  Pawel Matykiewicz,et al.  What’s In a Note: Construction of a Suicide Note Corpus , 2012, Biomedical informatics insights.

[37]  M. Bloch,et al.  Predictors of Long-Term School-Based Behavioral Outcomes in the Multimodal Treatment Study of Children with Attention-Deficit/Hyperactivity Disorder. , 2017, Journal of child and adolescent psychopharmacology.

[38]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[39]  Judith W. Dexheimer,et al.  A Real-Time Automated Patient Screening System for Clinical Trials Eligibility in an Emergency Department: Design and Evaluation , 2019, JMIR medical informatics.

[40]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[43]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[44]  Mary Ellen O'Toole,et al.  The School Shooter: A Threat Assessment Perspective , 2000 .

[45]  Connie Lim,et al.  Youth Risk Behavior Surveillance - United States, 2015. , 2016, Morbidity and mortality weekly report. Surveillance summaries.

[46]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.