"Ruspersonality": A Russian corpus for authorship profiling and deception detection

Authorship profiling which is a process of the extraction of information about the unknown autohr of a text (demographics, psychological traits, et al.) based on the analysis of linguistic parameters, is a problem of great importance. Research in authorship profiling has always been constrainted by the limited availability of training data since collecting textual data with the appropriate metadata (information about authors of texts) requires a lot of effort. We are presenting RusPersonality - first Russian-language corpus of written texts labeled with data on their authors. A unique aspect of our corpus is the breadth of the metadata (gender, age, personality, neuropsychological testing data, education level, ect). Most texts were designed especially for this corpus, do not contain any borrowings and are not edited. The corpus is desinged to serve multiple purposes: authoriship profiling, authorship attribution, deception detection, genre detection etc. The corpus currently contains over 1850 documents from 1145 respondents and is currently expanding. The average length of the texts is 230 words. The corpus can freely be used for academic research purposes on demand. The article describes the structure of the corpus and also shows the results of the research performed at our laboratory using its material and analyzes the perspectices for furture studies.