Social-Child-Case Document Clustering based on Topic Modeling using Latent Dirichlet Allocation

Children are the future of the nation. All treatment and learning they get would affect their future. Nowadays, there are various kinds of social problems related to children.  To ensure the right solution to their problem, social workers usually refer to the social-child-case (SCC) documents to find similar cases in the past and adapting the solution of the cases. Nevertheless, to read a bunch of documents to find similar cases is a tedious task and needs much time. Hence, this work aims to categorize those documents into several groups according to the case type. We use topic modeling with Latent Dirichlet Allocation (LDA) approach to extract topics from the documents and classify them based on their similarities. The Coherence Score and Perplexity graph are used in determining the best model. The result obtains a model with 5 topics that match the targeted case types. The result supports the process of reusing knowledge about SCC handling that ease the finding of documents with similar cases

[1]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[2]  Måns Magnusson,et al.  Pulling Out the Stops: Rethinking Stopword Removal for Topic Models , 2017, EACL.

[3]  Pushpak Bhattacharyya,et al.  Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution , 2018, Cognitive Computation.

[4]  Pari Delir Haghighi,et al.  Topic Modelling for Identification of Vaccine Reactions in Twitter , 2019, ACSW.

[5]  K. Fleischmann,et al.  Computational social science using topic modeling: Analyzing patients' values using a large hospital survey , 2018, ASIST.

[6]  David M. Mimno,et al.  Comparing Apples to Apple: The Effects of Stemmers on Topic Models , 2016, TACL.

[7]  W. Marsden I and J , 2012 .

[8]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[9]  Wael Etaiwi,et al.  The Impact of applying Different Preprocessing Steps on Review Spam Detection , 2017, EUSPN/ICTH.

[10]  Incheon Paik,et al.  Alleviating sparsity by specificity‐aware ontology‐based clustering for improving web service recommendation , 2019, IEEJ Transactions on Electrical and Electronic Engineering.

[11]  Carina Jacobi,et al.  Quantitative analysis of large amounts of journalistic texts using topic modelling , 2016, Rethinking Research Methods in an Age of Digital Journalism.

[12]  Ricardo J. G. B. Campello,et al.  Combining semantic and term frequency similarities for text clustering , 2019, Knowledge and Information Systems.

[13]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[14]  Tao Zhang,et al.  Research on Policy Text Clustering Algorithm Based on LDA-Gibbs Model , 2019, J. Adv. Comput. Intell. Intell. Informatics.

[15]  Y. Kee,et al.  Scoping Review of Mindfulness Research: a Topic Modelling Approach , 2019, Mindfulness.

[16]  Sergey I. Nikolenko,et al.  Topic modelling for qualitative studies , 2017, J. Inf. Sci..

[17]  Guixian Xu,et al.  Research on Topic Detection and Tracking for Online News Texts , 2019, IEEE Access.

[18]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[19]  Debasish Dutta,et al.  Ontology-Based Ambiguity Resolution of Manufacturing Text for Formal Rule Extraction , 2019, J. Comput. Inf. Sci. Eng..

[20]  Xia Feng,et al.  Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey , 2017, Multimedia Tools and Applications.

[21]  Elly Susilowati KOMPETENSI PEKERJA SOSIAL DALAM PELAKSANAAN TUGAS RESPON KASUS ANAK BERHADAPAN DENGAN HUKUM DI CIANJUR , 2017 .

[22]  Mika Mäntylä,et al.  Measuring LDA topic stability from clusters of replicated runs , 2018, ESEM.

[23]  M. Sundarambal,et al.  Clustering of biomedical documents using ontology-based TF-IGM enriched semantic smoothing model for telemedicine applications , 2019, Cluster Computing.

[24]  Bin Li,et al.  Exploring topic models in software engineering data analysis: A survey , 2016, 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  K. Toros,et al.  Family engagement in the child welfare system: A scoping review , 2018 .

[27]  Bo Ning,et al.  Research on Chinese Short Text Clustering Ensemble via Convolutional Neural Networks , 2020 .

[28]  Aytug Onan,et al.  An improved ant algorithm with LDA-based representation for text document clustering , 2017, J. Inf. Sci..

[29]  Yao Lu,et al.  LDA Meets Word2Vec: A Novel Model for Academic Abstract Clustering , 2018, WWW.