EACH-USP Ensemble Cross-domain Authorship Attribution: Notebook for PAN at CLEF 2018

We present an ensemble approach to cross-domain authorship attribution that combines predictions made by three independent classifiers, namely, standard char n-grams, char n-grams with non-diacritic distortion and word ngrams. Our proposal relies on variable-length n-gram models and multinomial logistic regression, and selects the prediction of highest probability among the three models as the output for the task. Results generally outperform the PANCLEF 2018 baseline system that makes use of fixed-length char n-grams and linear SVM classification.