Identifying child abuse through text mining and machine learning

In this paper, we describe how we used text mining and analysis to identify and predict cases of child abuse in a public health institution. Such institutions in the Netherlands try to identify and prevent different kinds of abuse. A significant part of the medical data that the institutions have on children is unstructured, found in the form of free text notes. We explore whether these consultation data contain meaningful patterns to determine abuse. Then we train machine learning models on cases of abuse as determined by over 500 child specialists from a municipality in The Netherlands. The resulting model achieves a high score in classifying cases of possible abuse. We methodologically evaluate and compare the performance of the classifiers. We then describe our implementation of the decision support API at a municipality in the Netherlands.

[1]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[2]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[3]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[4]  Ben Carterette,et al.  Predicting baby feeding method from unstructured electronic health record data , 2012, DTMBIO '12.

[5]  Philip Gillingham,et al.  Predictive Risk Modelling to Prevent Child Maltreatment and Other Adverse Outcomes for Service Users: Inside the ‘Black Box’ of Machine Learning , 2015, British journal of social work.

[6]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[7]  Qing Zeng-Treitler,et al.  A Suite of Natural Language Processing Tools Developed for the I2B2 Project , 2006, AMIA.

[8]  Suzanne Bakken,et al.  Exploring the Ability of Natural Language Processing to Extract Data From Nursing Narratives , 2009, Computers, informatics, nursing : CIN.

[9]  Nicolette de Keizer,et al.  Inventory of Tools for Dutch Clinical Language Processing , 2012, MIE.

[10]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[11]  G De Moor,et al.  A Dutch medical language processor. , 1996, International journal of bio-medical computing.

[12]  Dick Schoech,et al.  Expert Systems: Artificial Intelligence for Professional Decisions , 1985 .

[13]  E. D. Jones,et al.  The links between types of maltreatment and demographic characteristics of children. , 1992, Child abuse & neglect.

[14]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[15]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[16]  Development of a prediction model for child maltreatment recurrence in Japan: A historical cohort study using data from a Child Guidance Center. , 2016, Child abuse & neglect.

[17]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[18]  Georges De Moor,et al.  A Dutch medical language processor: part II: evaluation , 1998, Int. J. Medical Informatics.

[19]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[20]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[21]  Carol Friedman,et al.  A Study of Abbreviations in Clinical Notes , 2007, AMIA.

[22]  Maurice van Keulen,et al.  Process Prediction in Noisy Data Sets: A Case Study in a Dutch Hospital , 2012, SIMPDA.

[23]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[24]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[25]  Kirsti Malterud,et al.  For Personal Use. Only Reproduce with Permission from the Lancet Publishing Group. the Nature of Clinical Knowledge the Art and Science of Clinical Knowledge: Evidence beyond Measures and Numbers Qualitative Research Series , 2022 .

[26]  Hyoil Han,et al.  Approaches to text mining for clinical medical records , 2006, SAC '06.

[27]  Dick Schoech Interoperability and the Future of Human Services , 2010 .

[28]  Goran Nenadic,et al.  A text mining approach to the prediction of disease status from clinical discharge summaries. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[29]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[30]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[31]  D. Fergusson,et al.  Randomized Trial of the Early Start Program of Home Visitation: Parent and Family Outcomes , 2006, Pediatrics.

[32]  P. Sidebotham Red skies, risk factors and early indicators , 2003 .

[33]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[34]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Nan Jiang,et al.  Children in the public benefit system at risk of maltreatment: identification via predictive modeling. , 2013, American journal of preventive medicine.

[37]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[38]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[39]  Emily Keddell The ethics of predictive risk modelling in the Aotearoa/New Zealand child welfare context: Child abuse prevention or neo-liberal tool? , 2015 .

[40]  T. Besier,et al.  A German e-learning-training in the context of early preventive intervention and child protection: preliminary findings of a pre-post evaluation , 2016, Child and Adolescent Psychiatry and Mental Health.

[41]  Goldman Gm The tacit dimension of clinical judgment. , 1990 .

[42]  K. Bretonnel Cohen,et al.  Current issues in biomedical text mining and natural language processing , 2009, J. Biomed. Informatics.

[43]  Stephen G Henry,et al.  Recognizing Racit Knowledge in Medical Epistemology , 2006, Theoretical medicine and bioethics.

[44]  C. Powell Early indicators of child abuse and neglect: a multi‐professional Delphi study , 2003 .

[45]  Amanda J. Fairchild,et al.  In Search of a Silver Bullet: Child Welfare's Embrace of Predictive Analytics , 2017 .