Using text mining and machine learning for detection of child abuse

Abuse in any form is a grave threat to a child's health. Public health institutions in the Netherlands try to identify and prevent different kinds of abuse, and building a decision support system can help such institutions achieve this goal. Such decision support relies on the analysis of relevant child health data. A significant part of the medical data that the institutions have on children is unstructured, and in the form of free text notes. In this research, we employ machine learning and text mining techniques to detect patterns of possible child abuse in the data. The resulting model achieves a high score in classifying cases of possible abuse. We then describe our implementation of the decision support API at a municipality in the Netherlands.

[1]  Dick Schoech Interoperability and the Future of Human Services , 2010 .

[2]  Philip Gillingham,et al.  Predictive Risk Modelling to Prevent Child Maltreatment and Other Adverse Outcomes for Service Users: Inside the ‘Black Box’ of Machine Learning , 2015, British journal of social work.

[3]  Development of a prediction model for child maltreatment recurrence in Japan: A historical cohort study using data from a Child Guidance Center. , 2016, Child abuse & neglect.

[4]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[5]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[6]  Goran Nenadic,et al.  A text mining approach to the prediction of disease status from clinical discharge summaries. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[7]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[8]  Suzanne Bakken,et al.  Exploring the Ability of Natural Language Processing to Extract Data From Nursing Narratives , 2009, Computers, informatics, nursing : CIN.

[9]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[10]  Ben Carterette,et al.  Predicting baby feeding method from unstructured electronic health record data , 2012, DTMBIO '12.

[11]  K. Bretonnel Cohen,et al.  Current issues in biomedical text mining and natural language processing , 2009, J. Biomed. Informatics.

[12]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[13]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[14]  G De Moor,et al.  A Dutch medical language processor. , 1996, International journal of bio-medical computing.

[15]  Stephen G. Henry Recognizing tacit knowledge in medical epistemology , 2006 .

[16]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[17]  Qing Zeng-Treitler,et al.  A Suite of Natural Language Processing Tools Developed for the I2B2 Project , 2006, AMIA.

[18]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[19]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[20]  Dante Cicchetti,et al.  Child maltreatment. , 2005, Annual review of clinical psychology.

[21]  Nicolette de Keizer,et al.  Inventory of Tools for Dutch Clinical Language Processing , 2012, MIE.

[22]  Hyoil Han,et al.  Approaches to text mining for clinical medical records , 2006, SAC '06.

[23]  Georges De Moor,et al.  A Dutch medical language processor: part II: evaluation , 1998, Int. J. Medical Informatics.

[24]  D. Fergusson,et al.  Randomized Trial of the Early Start Program of Home Visitation: Parent and Family Outcomes , 2006, Pediatrics.

[25]  P. Sidebotham Red skies, risk factors and early indicators , 2003 .

[26]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[27]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[28]  Kirsti Malterud,et al.  For Personal Use. Only Reproduce with Permission from the Lancet Publishing Group. the Nature of Clinical Knowledge the Art and Science of Clinical Knowledge: Evidence beyond Measures and Numbers Qualitative Research Series , 2022 .

[29]  Nan Jiang,et al.  Children in the public benefit system at risk of maltreatment: identification via predictive modeling. , 2013, American journal of preventive medicine.

[30]  Dick Schoech,et al.  Expert Systems: Artificial Intelligence for Professional Decisions , 1985 .

[31]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[32]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[33]  Emily Keddell The ethics of predictive risk modelling in the Aotearoa/New Zealand child welfare context: Child abuse prevention or neo-liberal tool? , 2015 .

[34]  T. Besier,et al.  A German e-learning-training in the context of early preventive intervention and child protection: preliminary findings of a pre-post evaluation , 2016, Child and Adolescent Psychiatry and Mental Health.

[35]  A. Sedlak,et al.  Links between types of maltreatment and demographic characteristics of children. , 1994, Child abuse & neglect.

[36]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[37]  G. M. Goldman,et al.  The tacit dimension of clinical judgment. , 1990, The Yale journal of biology and medicine.

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  C. Powell Early indicators of child abuse and neglect: a multi‐professional Delphi study , 2003 .

[40]  Stephen G Henry,et al.  Recognizing Racit Knowledge in Medical Epistemology , 2006, Theoretical medicine and bioethics.

[41]  Carol Friedman,et al.  A Study of Abbreviations in Clinical Notes , 2007, AMIA.

[42]  Maurice van Keulen,et al.  Process Prediction in Noisy Data Sets: A Case Study in a Dutch Hospital , 2012, SIMPDA.

[43]  Goldman Gm The tacit dimension of clinical judgment. , 1990 .