Attribute Obfuscation with Gradient Reversal

Recent advances in computational stylometry have demonstrated that automatically inferring quite an extensive set of personal attributes from text alone (e.g. gender, age, education, socio-economic status, mental health issues) is not only feasible, but can often rely on little supervision. This application opens up potential for both industry and academia to uncover 'hidden' demographics for a large volume of social media accounts. It can be safely assumed that the majority of users of these media are not aware the latent information they are sharing, creating a false sense of privacy that can be easily abused by third parties. Even if they we aware, they would have no countermeasures at their disposal other than self-censorship. One of the proposed computational methods for assisting users in guarding particular attributes is that of author and/or attribute obfuscation, where the goal is to rewrite a particular text in such a way that a classifier trained on detecting an author (or its attributes) is fooled. Most of the work on this topic has focused on rule-based perturbations on text input, demonstrating only minor gains. Our proposal is to use a text encoder-decoder model which learns intermediate representations which are invariant to the protected attributes, and which -- thanks to this property -- is able to rewrite user text in a way which largely preserves its meaning, but which conceals user identity and/or attributes.