Cross-lingual syntactic variation over age and gender

Most computational sociolinguistics studies have focused on phonological and lexical variation. We present the first large-scale study of syntactic variation among demographic groups (age and gender) across several languages. We harvest data from online user-review sites and parse it with universal dependencies. We show that several age and gender-specific variations hold across languages, for example that women are more likely to use VP conjunctions.

[1]  W. Labov The social stratification of English in New York City , 1969 .

[2]  J. Milroy,et al.  Social network and social class: Toward an integrated sociolinguistic model , 1992, Language in Society.

[3]  J. Holmes Women, Language and Identity , 1997 .

[4]  Andrew J. Barke,et al.  The Effect of Age on the Style of Discourse among Japanese Women , 2000, PACLIC.

[5]  R. Macaulay You're like ‘why not?’ The quotative expressions of Glasgow adolescents , 2001 .

[6]  R. Macaulay Extremely interesting, very interesting, or only quite interesting? Adverbs and social class , 2002 .

[7]  John Nerbonne,et al.  Linguistic Variation and Computation (Invited talk) , 2003, EACL.

[8]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[9]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[10]  Jenny Cheshire Syntactic variation and beyond: Gender and social class variation in the use of discourse-new markers1 , 2005 .

[11]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[12]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[13]  Federica Barbieri Patterns of age-based linguistic variation in American English , 2008 .

[14]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[15]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[16]  Joan Bresnan,et al.  The dative alternation in African American English: Researching syntactic variation and change across sociolinguistic datasets , 2011 .

[17]  R. Baayen,et al.  Quantitative Social Dialectology: Explaining Linguistic Variation Geographically and Socially , 2011, PloS one.

[18]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[19]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[20]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[21]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[22]  Jacob Eisenstein,et al.  Phonological Factors in Social Media Writing , 2013 .

[23]  Philip S. Yu,et al.  Empirical Evaluation of Profile Characteristics for Gender Classification on Twitter , 2013, 2013 12th International Conference on Machine Learning and Applications.

[24]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[25]  J. Rickford,et al.  Girlz II women: Age‐grading, language change and stylistic variation , 2013 .

[26]  P. Carter Shared spaces, shared structures: Latino social formation and African American English in the U.S. south , 2013 .

[27]  David Yarowsky,et al.  Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media , 2013, EMNLP.

[28]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[29]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[30]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[31]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[32]  Gabriel Doyle,et al.  Mapping Dialectal Variation by Querying Social Media , 2014, EACL.

[33]  Dirk Hovy,et al.  Demographic Factors Improve Classification Performance , 2015, ACL.

[34]  Dirk Hovy,et al.  Challenges of studying and processing dialects in social media , 2015, NUT@IJCNLP.

[35]  Dirk Hovy,et al.  User Review Sites as a Resource for Large-Scale Sociolinguistic Studies , 2015, WWW.

[36]  Dirk Hovy,et al.  Tagging Performance Correlates with Author Age , 2015, ACL.