Grammar Checker Features for Author Identification and Author Profiling Notebook for PAN at CLEF 2013

Our work on author identification and author profiling is based on the question: Can the number and the types of grammatical errors serve as indica- tors for a specific author or a group of people? In order to detect the grammatical errors we base our approach on the output of the open-source library Language- Tool. In the case of the author identification we transform the problem into a statistical test, where an unknown document is written by another author when the distribution of grammatical errors deviated from documents of a reference corpus. For author profiling we implemented an instance based classification ap- proach, namely a k-NN classifier, in combination with a Language Model where a text is assigned to a specific age or gender group where the according reference corpus contains the closest match. In the evaluation we found that for both sce- narios grammatical errors do perform better than the baseline and do capture an aspect of a writing style, which is not contained in more traditional features, like stylometric features or word n-grams.