Tasks such as Authorship Attribution, Intrinsic Plagiarism detection and Sexual Predator Identification are representative of attempts to deceive. In the first two, authors try to convince others that the presented work is theirs, and in the third there is an attempt to convince readers to take actions based on false beliefs or ill-perceived risks. In this paper, we discuss our approaches to these tasks in the Author Identification track at PAN2012, which represents our first proper attempt at any of them. Our initial intention was to determine whether cues of deception, documented in the literature, might be relevant to such tasks. However, it quickly became apparent that such cues would not be readily useful, and we discuss the results achieved using some simple but relatively novel approaches: for the Traditional Authorship Attribution task, we show how a mean-variance framework using just 10 stopwords detects 42.8% and could be obtain 52.12% using fewer; for Intrinsic Plagiarism Detection, frequent words achieved 91.1% overall; and for Sexual Predator Identification, we used just a few features covering requests for personal information, with mixed results.
[1]
Kevin C. Moffitt,et al.
What Does That Mean? Investigating Obfuscation and Readability Cues as Indicators of Deception in Fraudulent Financial Reports
,
2009,
AMCIS.
[2]
Kenneth Ward Church,et al.
Word Association Norms, Mutual Information, and Lexicography
,
1989,
ACL.
[3]
David B. Skillicorn,et al.
Improving a textual deception detection model
,
2006,
CASCON.
[4]
L. Gillam,et al.
“I Don’t Know Where He is Not”: Does Deception Research yet Offer a Basis for Deception Detectives?
,
2012
.
[5]
J. Pennebaker,et al.
Lying Words: Predicting Deception from Linguistic Styles
,
2003,
Personality & social psychology bulletin.
[6]
James J. Lindsay,et al.
Cues to deception.
,
2003,
Psychological bulletin.
[7]
David B. Skillicorn,et al.
Detecting deception in testimony
,
2008,
2008 IEEE International Conference on Intelligence and Security Informatics.