Text analysis framework for understanding cyber-crimes

Article history: Received 17 February 2017 Received in revised form 11 August 2017 Accepted 19 August 2017 The magnitude of impact of cyber-crimes is much greater as compared to other crimes and can be felt at personal, societal, national as well as global level. According to studies, developing countries are at a greater risk due to such crimes. Fight against cyber-crime requires a strategic and intelligent framework. This paper discusses text analysis framework using Natural Language Processing (NLP) and text mining techniques to extract crime related information which can be used for educating and spreading awareness and for further knowledge based analysis. News articles crawled from a leading newspaper website in India is used as the source of cybercrime data. Parts of Speech (POS) tagging is used to extract important terms/concepts related to cybercrimes. Term association analysis on the other hand is used to understand the relationship of extracted terms of the data.

[1]  Shyam Varan Nath,et al.  Crime Pattern Detection Using Data Mining , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops.

[2]  David Lo,et al.  A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[3]  Tushar Gupta,et al.  Crime detection and criminal identification in India using data mining techniques , 2014, AI & SOCIETY.

[4]  Nazlia Omar,et al.  Heuristic-based entity-relationship modelling through natural language processing , 2004 .

[5]  Bastin Tony Roy Savarimuthu,et al.  Extracting Crime Information from Online Newspaper Articles , 2014, AWC.

[6]  Thierry Hamon,et al.  Improving Term Extraction with Terminological Resources , 2006, FinTAL.

[7]  R. B. Santos,et al.  Definition and Types of Crime Analysis , 2014 .

[8]  Gang Wang,et al.  Crime data mining: a general framework and some examples , 2004, Computer.

[9]  Corlane Barclay,et al.  Sustainable security advantage in a changing environment: The Cybersecurity Capability Maturity Model (CM2) , 2014, Proceedings of the 2014 ITU kaleidoscope academic conference: Living in a converged world - Impossible without standards?.

[10]  Jonathan I. Maletic,et al.  Heuristic-based part-of-speech tagging of source code identifiers and comments , 2015, 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD).

[11]  Tong Wang,et al.  Learning to Detect Patterns of Crime , 2013, ECML/PKDD.

[12]  Indika Perera,et al.  Crime analytics: Analysis of crimes through newspaper articles , 2015, 2015 Moratuwa Engineering Research Conference (MERCon).

[13]  Subhayu Chakravorty Data mining techniques for analyzing murder related structured and unstructured data , 2015 .