An Ensemble-Rich Multi-Aspect Approach Towards Robust Style Change Detection: Notebook for PAN at CLEF 2018

We describe the winning system for the PAN@CLEF 2018 task on Style Change Detection. Given a document, the goal is to determine whether it contains style change. We present our supervised approach, which combines a TF.IDF representation of the documents with features specifically engineered for the task and which makes predictions using an ensemble of diverse models including SVM, Random Forest, AdaBoost, MLP and LightGBM. We further perform comparative analysis on the performance of the models on three different datasets, two of which we have developed for the task. Moreover, we release our code in order to enable further research.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Iqra Ameer,et al.  Identification of Author Personality Traits using Stylistic Features: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[3]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[5]  Benno Stein,et al.  Overview of PAN'17 - Author Identification, Author Profiling, and Author Obfuscation , 2017, CLEF.

[6]  Benno Stein,et al.  Plagiarism Detection Without Reference Collections , 2006, GfKl.

[7]  Benno Stein,et al.  Clustering by Authorship Within and Across Documents , 2016, CLEF.

[8]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[9]  Vadim V. Strijov,et al.  Methods for Intrinsic Plagiarism Detection and Author Diarization , 2016, CLEF.

[10]  Rao Muhammad Adeel Nawab,et al.  Author Diarization Using Cluster-Distance Approach , 2016, CLEF.

[11]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[12]  Preslav Nakov,et al.  Experiments in Authorship-Link Ranking and Complete Author Clustering , 2016, CLEF.

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[15]  Jamal Ahmad Khan Style Breach Detection: An Unsupervised Detection Model , 2017, CLEF.

[16]  Martyna Spiewak,et al.  OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection , 2017, CLEF.