Identifying Authorship from Linguistic Text Patterns

Research that deals with linguistic text patterns is challenging because of the unstructured nature of text. This research presents a methodology to compare texts to identify whether two texts are written by the same or different authors. The methodology includes an algorithm to analyze the proximity of text, which is based upon Zipf’s Law [47][48]. The results have implications for text mining with applications to areas such as forensics, natural language processing, and information retrieval.

[1]  Lisa F. Rau,et al.  Information extraction and text summarization using linguistic knowledge acquisition , 1989, Inf. Process. Manag..

[2]  Bruno S. Frey,et al.  Fighting Political Terrorism by Refusing Recognition , 1987, Journal of Public Policy.

[3]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[4]  H. Thomas The Shakespeare Authorship Controversy , 1932 .

[5]  Campbell B. Read,et al.  Zipf's Law , 2004 .

[6]  B. J. Phillips,et al.  Terrorist Group Cooperation and Longevity , 2013 .

[7]  Carl-Erik Särndal,et al.  On Deciding Cases of Disputed Authorship , 1967 .

[8]  Marina Apaydin,et al.  A Multi-Dimensional Framework of Organizational Innovation: A Systematic Review of the Literature , 2010 .

[9]  Eloi Le Divenach English in the news , 1986 .

[10]  Ron Weber,et al.  Toward a Theory of the Deep Structure of Information Systems , 1990, ICIS.

[11]  B. M. Hill,et al.  Zipf's Law and Prior Distributions for the Composition of a Population , 1970 .

[12]  R. Walgate Tale of two cities , 1984, Nature.

[13]  Samir Chatterjee,et al.  A Design Science Research Methodology for Information Systems Research , 2008 .

[14]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[15]  Shirley Gregor,et al.  What’s new about digital innovation? , 2016 .

[16]  Michael K. Buckland,et al.  Annual Review of Information Science and Technology , 2006, J. Documentation.

[17]  Roy T. Cook What is a Truth Value And How Many Are There? , 2009, Stud Logica.

[18]  Eva Salzman The Plagiarist , 2015 .

[19]  Thorsten Brants,et al.  Natural Language Processing in Information Retrieval , 2003, CLIN.

[20]  Stuart Hannabuss,et al.  The Big Switch: Rewiring the World, from Edison to Google , 2009 .

[21]  P. Portner,et al.  What is Meaning?: Fundamentals of Formal Semantics , 2005 .

[22]  Vidyasagar Potdar,et al.  Computational approaches for emotion detection in text , 2010, 4th IEEE International Conference on Digital Ecosystems and Technologies.

[23]  Soumalya Ghosh,et al.  Cost of error correction quantification with Bengali text transcription , 2012, 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI).

[24]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[25]  Laureano Luna Indefinite Extensibility in Natural Language , 2013 .

[26]  Stéfan Sinclair,et al.  The Measured Words: How Computers Analyze Text , 2016 .

[27]  Philippe Schlenker Super Liars , 2010, Rev. Symb. Log..

[28]  Peter A. Beling,et al.  Machine quantification of text-based economic reports for use in predictive modeling , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[29]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[31]  Rosalind Barber Shakespeare Authorship Doubt in 1593 , 2009 .

[32]  Justin Conrad,et al.  Interstate Rivalry and Terrorism , 2011 .

[33]  George Kingsley Zipf,et al.  The Psychobiology of Language , 2022 .

[34]  Alan R. Hevner,et al.  POSITIONING AND PRESENTING DESIGN SCIENCE RESEARCH FOR MAXIMUM IMPACT 1 , 2013 .

[35]  R. Harald Baayen,et al.  Statistical models for word frequency distributions: A linguistic evaluation , 1992, Comput. Humanit..

[36]  Kalle Lyytinen,et al.  Organizing for Innovation in the Digitized World , 2012, Organ. Sci..