A pragmatic validation of stylometric techniques using BPA

There are many modes of communication, but electronic communication is the most noted one in present. Internet is the backbone for all these communications. In digital forensics, finding out the author of a document is a big qestion, identity of the author, their demographic background, and how they are linked to other documents. So major challenges in digital forensic investigation are author identification of message(s) and non-repudiation. In this paper we used Stylometry based human writing feature extraction as a solution for the author identification problem. Stylometry is not only a way of human writing pattern identification but it can also be used for human gender identification. This paper is oriented to highlight some of the ways to manage such problems like anonymous email messages, email abuse and even for the digital forensics. In this paper, 62 stylistic features have been collected for different users, using C language. 22 samples of 150 words for each user have been taken to train the Neural Network using Back Propagation Algorithm(BPA). In different variations of the experimental setup, 98.312% accuracy have been achieved.

[1]  Stephen Marsh,et al.  Formalising Trust as a Computational Concept , 1994 .

[2]  Luiz Eduardo Soares de Oliveira,et al.  Compression and stylometry for author identification , 2009, 2009 International Joint Conference on Neural Networks.

[3]  George C. Necula,et al.  Safe kernel extensions without run-time checking , 1996, OSDI '96.

[4]  Manish Mahajan,et al.  Proof carrying code , 2015 .

[5]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[6]  Jim Rankin,et al.  Building trust ‐ the essential ingredient in partnering to improve business results , 1998 .

[7]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[8]  Stefan Gruner,et al.  Tool support for plagiarism detection in text documents , 2005, SAC '05.

[9]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[10]  J. A. Smith,et al.  Stylistic Constancy and Change Across Literary Corpora: Using Measures of Lexical Richness to Date Works , 2002, Comput. Humanit..

[11]  Paul J. Werbos,et al.  The roots of backpropagation , 1994 .

[12]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[13]  Rachel Greenstadt,et al.  Practical Attacks Against Authorship Recognition Techniques , 2009, IAAI.

[14]  M. Connors,et al.  Stylometry for E-mail Author Identification and Authentication , 2008 .

[15]  Azzedine Boukerche,et al.  A Study on Different Approaches of Selective Encryption Technique , 2012 .

[16]  David Yarowsky,et al.  Stylometric Analysis of Scientific Articles , 2012, NAACL.

[17]  Rajarathnam Chandramouli,et al.  Gender identification from E-mails , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[18]  R. Forsyth Stylochronometry with substrings, or : a poet young and old , 1999 .