Towards Improved Model Design for Authorship Identification: A Survey on Writing Style Understanding

Authorship identification tasks, which rely heavily on linguistic styles, have always been an important part of Natural Language Understanding (NLU) research. While other tasks based on linguistic style understanding benefit from deep learning methods, these methods have not behaved as well as traditional machine learning methods in many authorship-based tasks. With these tasks becoming more and more challenging, however, traditional machine learning methods based on handcrafted feature sets are already approaching their performance limits. Thus, in order to inspire future applications of deep learning methods in authorship-based tasks in ways that benefit the extraction of stylistic features, we survey authorship-based tasks and other tasks related to writing style understanding. We first describe our survey results on the current state of research in both sets of tasks and summarize existing achievements and problems in authorship-related tasks. We then describe outstanding methods in style-related tasks in general and analyze how they are used in combination in the top-performing models. We are optimistic about the applicability of these models to authorship-based tasks and hope our survey will help advance research in this field.

[1]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[2]  Liang Zhao,et al.  LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification , 2019, EMNLP.

[3]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[4]  Douglas Bagnall,et al.  Author Identification Using Multi-headed Recurrent Neural Networks , 2015, CLEF.

[5]  Manuel Montes-y-Gómez,et al.  Detecting Depression in Social Media using Fine-Grained Emotions , 2019, NAACL.

[6]  Alexander Gelbukh,et al.  DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation , 2019, EMNLP.

[7]  Yangqiu Song,et al.  Multilingual and Multi-Aspect Hate Speech Analysis , 2019, EMNLP.

[8]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[9]  Xiaoyan Zhu,et al.  Linguistically Regularized LSTM for Sentiment Classification , 2016, ACL.

[10]  Mark Stevenson,et al.  Continuous N-gram Representations for Authorship Attribution , 2017, EACL.

[11]  Nikolaos Aletras,et al.  Automatically Identifying Complaints in Social Media , 2019, ACL.

[12]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[13]  Zhiyuan Liu,et al.  Neural Sentiment Classification with User and Product Attention , 2016, EMNLP.

[14]  Björn W. Schuller,et al.  YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context , 2013, IEEE Intelligent Systems.

[15]  Youngjun Joo,et al.  Author Profiling on Social Media: An Ensemble Learning Approach using Various Features , 2019, CLEF.

[16]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[17]  Daniela Chudá,et al.  Bots and Gender Profiling with Convolutional Hierarchical Recurrent Neural Network , 2019, CLEF.

[18]  Mathieu Cliche,et al.  BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs , 2017, *SEMEVAL.

[19]  Cécile Paris,et al.  Cross-Target Stance Classification with Self-Attention Networks , 2018, ACL.

[20]  Patrick Juola,et al.  An Overview of the Traditional Authorship Attribution Subtask , 2012, CLEF.

[21]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[22]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[23]  Bryan Rink,et al.  Introducing the LCC Metaphor Datasets , 2016, LREC.

[24]  Blaz Skrlj,et al.  Who is Hot and Who is Not? Profiling Celebs on Twitter , 2019, CLEF.

[25]  Paolo Rosso,et al.  Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling in Twitter , 2019, CLEF.

[26]  Rada Mihalcea,et al.  ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection , 2018, EMNLP.

[27]  Richong Zhang,et al.  Aspect-Level Sentiment Analysis Via Convolution over Dependency Tree , 2019, EMNLP.

[28]  Lidong Bing,et al.  Recurrent Attention Network on Memory for Aspect Sentiment Analysis , 2017, EMNLP.

[29]  Pasquale Lops,et al.  Identification Of Bot Accounts In Twitter Using 2D CNNs On User-generated Contents , 2019, CLEF.

[30]  Abhishek Kumar,et al.  A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis , 2017, EMNLP.

[31]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[32]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[33]  Steven Bethard,et al.  Not All Character N-grams Are Created Equal: A Study in Authorship Attribution , 2015, NAACL.

[34]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[35]  Joel R. Tetreault,et al.  Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.

[36]  John G. Breslin,et al.  Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution , 2016, ArXiv.

[37]  Gordon Lucas,et al.  Authorship Attribution in Fan-fictional Texts given Variable Length Character and Word n-grams , 2019, CLEF.

[38]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[39]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[40]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[41]  André Freitas,et al.  SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News , 2017, *SEMEVAL.

[42]  Erik Cambria,et al.  Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[44]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[45]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[46]  Andrew Cattle,et al.  Embedding Lexical Features via Tensor Decomposition for Small Sample Humor Recognition , 2019, EMNLP.

[47]  Kien A. Hua,et al.  Syntactic Neural Model for Authorship Attribution , 2020, FLAIRS.

[48]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[49]  Yang Yang,et al.  Relevant Emotion Ranking from Text Constrained with Emotion Relationships , 2018, NAACL.

[50]  Andreas Kerren,et al.  Annotating Speaker Stance in Discourse: The Brexit Blog Corpus , 2017, Corpus Linguistics and Linguistic Theory.

[51]  Jakab Buda,et al.  Bot Or Not: A Two-Level Approach In Author Profiling , 2019, CLEF.

[52]  Ingrid Zukerman,et al.  Collaborative Inference of Sentiments from Texts , 2010, UMAP.

[53]  Rada Mihalcea,et al.  MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations , 2018, ACL.

[54]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[55]  Yulia Tsvetkov,et al.  Metaphor Detection with Cross-Lingual Model Transfer , 2014, ACL.

[56]  Ekaterina Shutova,et al.  Grasping the Finer Point: A Supervised Similarity Network for Metaphor Detection , 2017, EMNLP.

[57]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[58]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[59]  Dai Quoc Nguyen,et al.  NIHRIO at SemEval-2018 Task 3: A Simple and Accurate Neural Network Model for Irony Detection in Twitter , 2018, *SEMEVAL.

[60]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[61]  Benno Stein,et al.  Overview of the Cross-domain Authorship Attribution Task at PAN 2019 , 2019, CLEF.

[62]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[63]  Louis-Philippe Morency,et al.  Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages , 2016, IEEE Intelligent Systems.

[64]  Helena Gómez-Adorno,et al.  Bots and Gender Identification Based on Stylometry of Tweet Minimal Structure and n-grams Model , 2019, CLEF.

[65]  Mickael Rouvier LIA at SemEval-2017 Task 4: An Ensemble of Neural Networks for Sentiment Classification , 2017, SemEval@ACL.

[66]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[67]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[68]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[69]  Fredrik Johansson,et al.  Supervised Classification of Twitter Accounts Based on Textual Content of Tweets , 2019, CLEF.

[70]  Eunsol Choi,et al.  Neural Metaphor Detection in Context , 2018, EMNLP.

[71]  Marine Carpuat,et al.  A Study of Style in Machine Translation: Controlling the Formality of Machine Translation Output , 2017, EMNLP.

[72]  Min Yang,et al.  Attention Based LSTM for Target Dependent Sentiment Classification , 2017, AAAI.

[73]  Ting Liu,et al.  Aspect Level Sentiment Classification with Deep Memory Network , 2016, EMNLP.

[74]  Chen Zhang,et al.  Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks , 2019, EMNLP/IJCNLP.

[75]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[76]  Mikhail Khodak,et al.  A Large Self-Annotated Corpus for Sarcasm , 2017, LREC.

[77]  Svitlana Volkova,et al.  Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast , 2016, ACL.

[78]  Hongyu Guo,et al.  Syntax Encoding with Application in Authorship Attribution , 2018, EMNLP.

[79]  Tao Li,et al.  Aspect Based Sentiment Analysis with Gated Convolutional Networks , 2018, ACL.

[80]  A. G. Dorst,et al.  Metaphor in usage , 2010 .

[81]  Benno Stein,et al.  Celebrity Profiling , 2019, ACL.

[82]  Kevin Seppi,et al.  Humor Detection: A Transformer Gets the Last Laugh , 2019, EMNLP.

[83]  Efstathios Stamatatos,et al.  Author identification: Using text sampling to handle the class imbalance problem , 2008, Inf. Process. Manag..

[84]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[85]  Erik Cambria,et al.  Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos , 2018, NAACL.

[86]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[87]  Zhe Zhang,et al.  Limbic: Author-Based Sentiment Aspect Modeling Regularized with Word Embeddings and Discourse Relations , 2018, EMNLP.

[88]  Masaki Aono,et al.  Tweet Stance Detection Using an Attention based Neural Ensemble Model , 2019, NAACL.

[89]  Feng Xia,et al.  Telling the Whole Story: A Manually Annotated Chinese Dataset for the Analysis of Humor in Jokes , 2019, EMNLP/IJCNLP.

[90]  Xiang Ao,et al.  A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis , 2019, EMNLP.

[91]  Guillaume Bouchard,et al.  SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban Neighbourhoods , 2016, COLING.

[92]  Chunyan Miao,et al.  Knowledge-Enriched Transformer for Emotion Detection in Textual Conversations , 2019, EMNLP.

[93]  Jiebo Luo,et al.  Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis , 2019, ACL.

[94]  Daniel Preotiuc-Pietro,et al.  Analyzing Linguistic Differences between Owner and Staff Attributed Tweets , 2019, ACL.

[95]  Pushpak Bhattacharyya,et al.  IITPB at SemEval-2017 Task 5: Sentiment Prediction in Financial Text , 2017, SemEval@ACL.