Latent dirichlet allocation based blog analysis for criminal intention detection system

Crimes cause several costs to the society, including direct economic costs, victim costs, and other intangible costs. Recently, the research works of implementing computerized system to address the problem of crime has seen a growing interest. In this work, we propose a criminal intention detection system, which objective is to detect the intention of committing a crime by analyzing the content of text documents from article sources found in the internet. The crime intention can be detected from the collection of documents if the topic of the text is properly categorized. We propose an early-warning system to detect the crime activity intention using latent Dirichlet allocation (LDA) and collaborative representation classifier (CRC). Our proposed system involves two stages. In the first stage, we employed LDA as a feature learning method to extract the representation of documents in the article sources, and for the second stage, we used the extracted features from LDA to construct an overcomplete dictionary for CRC to build a classifier to find the related topic for a new testing document. CRC solves an l2-norm optimization problem to find the related topic for a new testing document. Comparing with l1-norm optimization problem in sparse representation classifier (SRC), l2-norm in CRC could obtain relatively similar accuracy with SRC but with massively reduced time complexity. The experimental results show that our proposed method demonstrates a higher accuracy compared to the traditional method.

[1]  Xu Yao,et al.  Criminal Detection Based on Social Network Analysis , 2012, SKG.

[2]  Chia-Yen Chen,et al.  News topics categorization using latent Dirichlet allocation and sparse representation classifier , 2015, 2015 IEEE International Conference on Consumer Electronics - Taiwan.

[3]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[4]  Anders Holst,et al.  A Bayesian Parametric Statistical Anomaly Detection Method for Finding Trends and Patterns in Criminal Behavior , 2013, 2013 European Intelligence and Security Informatics Conference.

[5]  Fan Zhang,et al.  Performance and theoretical study on corrosion inhibition of 2-(4-pyridyl)-benzimidazole for mild steel in hydrochloric acid , 2012 .

[6]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[7]  Haitao Liu,et al.  An improved KNN text classification algorithm based on density , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[8]  David Zhang,et al.  Collaborative Representation based Classification for Face Recognition , 2012, ArXiv.

[9]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[10]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[11]  Sung-Hyon Myaeng,et al.  An extension of topic models for text classification: A term weighting approach , 2015, 2015 International Conference on Big Data and Smart Computing (BIGCOMP).

[12]  Yao Xu,et al.  Criminal Detection Based on Social Network Analysis , 2012, 2012 Eighth International Conference on Semantics, Knowledge and Grids.

[13]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[14]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  H. Chen,et al.  Automatically detecting criminal identity deception: an adaptive detection algorithm , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[16]  Michael T French,et al.  The cost of crime to society: new crime-specific estimates for policy and program evaluation. , 2010, Drug and alcohol dependence.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..