The creation of Base Rate Knowledge of linguistic variables and the implementation of likelihood ratios to authorship attribution in forensic text comparison

This article contributes to the research challenges that Forensic Linguistics faces in the 21st century – to compare texts of unknown authorship with the same reliability as other disciplines that consider forensic evidence. This research implements advanced statistical techniques within the field of forensic text comparison that improve the reliability of linguistic evidence furnished in Court and assess its significance. The first part of the analysis creates a Base Rate Knowledge for some of the most relevant linguistic variables in Peninsular Spanish texts. The second part applies statistical tests to variables with discriminatory potential to identify the samples of the authors and also assesses the reliability of the results in a posteriori classification. The implementation of the likelihood-ratio framework in the third part improves the reliability of linguistic evidence provided in court and offers probabilistic results to assist not only the judge and jury but also the linguistic expert in order to carry out more rigorous testing and extensive performance analysis of the data.

[1]  N. Fenton,et al.  On limiting the use of Bayes in presenting forensic evidence , 2012 .

[2]  I. W. Evett,et al.  Towards a uniform framework for reporting opinions in forensic science casework , 1998 .

[3]  David I. Holmes,et al.  Stylometry and the Civil War: The Case of the Pickett Letters , 2003 .

[4]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[5]  C. Aitken,et al.  Expressing evaluative opinions: a position statement , 2011 .

[6]  Carole E. Chaski,et al.  Empirical evaluations of language-based author identification techniques , 2001 .

[7]  A. P. B. Sardinha Corpus linguistics - investigating language structure and use , 1999 .

[8]  Robert Bayley,et al.  Variation in the group and the individual: Evidence from second language acquisition , 2004 .

[9]  Marjan Sjerps,et al.  The interpretation of conventional and 'Bayesian' verbal scales for expressing expert opinion: a small experiment among jurists , 1999 .

[10]  Núria Bel,et al.  The use of sequences of linguistic categories in forensic written text comparison revisited , 2012 .

[12]  Javier Ortega-Garcia,et al.  Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition , 2006, Comput. Speech Lang..

[13]  Tim D. Grant,et al.  Identifying reliable, valid markers of authorship: a response to Chaski , 2001 .

[14]  Philip Rose Forensic Speaker Identification , 2002 .

[15]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[16]  M. T. Turell The use of textual, grammatical and sociolinguistic evidence in forensic text comparison: , 2011 .

[17]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[18]  John Burrows,et al.  Questions of Authorship: Attribution and Beyond A Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, New York , 2003, Comput. Humanit..

[19]  David Woolls,et al.  Tools for the Trade , 1998 .

[20]  A. Broeders,et al.  Some observations on the use of probability scales in forensic identification , 1999 .

[21]  Harry Hollien,et al.  The Phonetic Bases of Speaker Recognition by Francis Nolan , 1985 .

[23]  Shunichi Ishihara,et al.  A likelihood ratio-based evaluation of strength of authorship attribution evidence in SMS messages using N-grams , 2014 .

[24]  Andrew Wilson,et al.  Corpus linguistics : an introduction. , 2001 .

[25]  Tim Grant,et al.  Text messaging forensics : Txt 4n6: idiolect free authorship analysis? , 2010 .

[26]  David Wright,et al.  Stylistic variation within genre conventions in the Enron email corpus: developing a textsensitive methodology for authorship research , 2013 .

[27]  M. Coulthard Author Identification, Idiolect, and Linguistic Uniqueness. , 2004 .

[28]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[29]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[30]  Barbara Johnstone,et al.  The Linguistic Individual: Self-Expression in Language and Linguistics , 1996 .

[31]  Miquel Àngel Pradilla Cardona La sociolingüística de la variació: aproximació metodològica (I) , 2001 .

[32]  Shunichi Ishihara,et al.  Strength of forensic text comparison evidence from stylometric features: a multivariate likelihood ratio-based analysis , 2017 .

[33]  John Burrows,et al.  Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style , 1987 .

[34]  M. Teresa Turell Textual kidnapping revisited: the case of plagarism in literary translation , 2007 .

[35]  Franco Taroni,et al.  Statistics and the Evaluation of Evidence for Forensic Scientists , 2004 .

[36]  Sali A. Tagliamonte Analysing Sociolinguistic Variation , 2006 .

[37]  Maria Stefanova Spassova El potencial discriminatorio de las secuencias de categorías gramaticales en la atribución forense de autoría de textos en español , 2009 .

[38]  Maria Teresa Turrel,et al.  The Use of Morpho-Syntactically Annotated Tag Sequences as Markers of Authorship , 2007 .

[39]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009 .

[40]  E. Schneider Sociolinguistic Theory: Linguistic Variation and Its Social Significance , 1999 .