Variational Autoencoder with Embedded Student-t Mixture Model for Authorship Attribution

Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task. More precisely, we are extending a variational autoencoder (VAE) with embedded Gaussian mixture model to a Student-$t$ mixture model. Autoencoders have had tremendous success in learning latent representations. However, existing VAEs are currently still bound by limitations imposed by the assumed Gaussianity of the underlying probability distributions in the latent space. In this work, we are extending the Gaussian model for the VAE to a Student-$t$ model, which allows for an independent control of the "heaviness" of the respective tails of the implied probability densities. Experiments over an Amazon review dataset indicate superior performance of the proposed method.

[1]  Mohammad Emtiyaz Khan,et al.  Variational Message Passing with Structured Inference Networks , 2018, ICLR.

[2]  Michel Verleysen,et al.  Robust Bayesian clustering , 2007, Neural Networks.

[3]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[4]  Lars Hertel,et al.  Approximate Inference for Deep Latent Gaussian Mixtures , 2016 .

[5]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Robert M. Nickel,et al.  Explainable Authorship Verification in Social Media via Attention-based Similarity Learning , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[8]  Robert M. Nickel,et al.  Similarity Learning for Authorship Verification in Social Media , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[10]  Li Fei-Fei,et al.  Tackling Over-pruning in Variational Autoencoders , 2017, ArXiv.

[11]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[12]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[13]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[14]  Najmeh Abiri,et al.  Variational auto-encoders with Student's t-prior , 2020, ESANN.

[15]  Bhiksha Raj,et al.  Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery , 2017, INTERSPEECH.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Hiroshi Takahashi,et al.  Student-t Variational Autoencoder for Robust Density Estimation , 2018, IJCAI.

[18]  Anderson Rocha,et al.  A Needle in a Haystack? Harnessing Onomatopoeia and User-specific Stylometrics for Authorship Attribution of Micro-messages , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Günther Specht,et al.  Reduce & Attribute: Two-Step Authorship Attribution for Large-Scale Problems , 2019, CoNLL.

[20]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[21]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[22]  Christopher M. Bishop,et al.  Robust Bayesian Mixture Modelling , 2005, ESANN.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Efstathios Stamatatos,et al.  Authorship Attribution for Social Media Forensics , 2017, IEEE Transactions on Information Forensics and Security.

[26]  Marco Cote STICK-BREAKING VARIATIONAL AUTOENCODERS , 2017 .