Discovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases

Detecting and analyzing stylistic variation in language is relevant to diverse Natural Language Processing applications. In this work, we investigate whether salient dimensions of style variations are embedded in standard distributional vector spaces of word meaning. We hypothesize that distances between embeddings of lexical paraphrases can help isolate style from meaning variations and help identify latent style dimensions. We conduct a qualitative analysis of latent style dimensions, and show the effectiveness of identified style subspaces on a lexical formality prediction task.

[1]  Jon M. Kleinberg,et al.  Echoes of power: language effects and power differences in social interaction , 2011, WWW.

[2]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[3]  M. Hurtig Varieties of English in the Swedish Classroom , 2007 .

[4]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[5]  François Mairesse Learning to adapt in dialogue systems : data-driven models for personality recognition and generation , 2008 .

[6]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[7]  Robin Hosie Choose the Right Word , 2012 .

[8]  Jean-Marc Dewaele,et al.  Formality of Language: definition, measurement and behavioral determinants , 1999 .

[9]  Joel R. Tetreault,et al.  An Empirical Analysis of Formality in Online Communication , 2016, TACL.

[10]  Hinrich Schütze,et al.  Ultradense Word Embeddings by Orthogonal Transformation , 2016, NAACL.

[11]  J. T. Irvine Formality and Informality in Communicative Events , 1979 .

[12]  Ani Nenkova,et al.  Inducing Lexical Style Properties for Paraphrase and Genre Differentiation , 2015, NAACL.

[13]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[14]  Penelope Brown,et al.  Speech as a marker of situation , 1979 .

[15]  Tong Wang,et al.  Automatic Acquisition of Lexical Formality , 2010, COLING.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Akshay Java,et al.  The ICWSM 2009 Spinn3r Dataset , 2009 .

[18]  Hinrich Schütze,et al.  Word Embedding Calculus in Meaningful Ultradense Subspaces , 2016, ACL.

[19]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[20]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[21]  Ralph Grishman,et al.  Paraphrasing for Style , 2012, COLING.

[22]  Lucia Specia,et al.  SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.

[23]  Graeme Hirst,et al.  Supervised Ranking of Co-occurrence Profiles for Acquisition of Continuous Lexical Attributes , 2014, COLING.

[24]  Lyle H. Ungar,et al.  Discovering User Attribute Stylistic Differences via Paraphrasing , 2016, AAAI.

[25]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.