Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.

[1]  Ngoc Thang Vu,et al.  Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling , 2013, ACL.

[2]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[3]  Phil Blunsom,et al.  Multilingual Distributed Representations without Word Alignment , 2013, ICLR 2014.

[4]  Yugo Murawaki,et al.  Diachrony-aware Induction of Binary Latent Representations from Typological Features , 2017, IJCNLP.

[5]  Regina Barzilay,et al.  Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings , 2016, NAACL.

[6]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[7]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[8]  Hal Daumé,et al.  A Bayesian Model for Discovering Typological Implications , 2007, ACL.

[9]  Wanxiang Che,et al.  A Universal Framework for Inductive Transfer Parsing across Multi-typed Treebanks , 2016, COLING.

[10]  Regina Barzilay,et al.  Learning to Map into a Universal POS Tagset , 2012, EMNLP-CoNLL.

[11]  Richard Sproat,et al.  Language typology in speech and language technology , 2016 .

[12]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[13]  Kevin Knight,et al.  Grapheme-to-Phoneme Models for (Almost) Any Language , 2016, ACL.

[14]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[15]  Dragomir R. Radev,et al.  Classifying Syntactic Regularities for Hundreds of Languages , 2016, ArXiv.

[16]  Fei Xia,et al.  Automatically Identifying Computationally Relevant Typological Features , 2008, IJCNLP.

[17]  Shashi Narayan,et al.  Encoding Prior Knowledge with Eigenword Embeddings , 2015, TACL.

[18]  B. Bickel Typology in the 21st century: Major current developments , 2007 .

[19]  James L. McClelland,et al.  Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition , 2005 .

[20]  Ivan Titov,et al.  Crosslingual Induction of Semantic Roles , 2012, ACL.

[21]  時崎 久夫,et al.  The Universals Archiveによる音韻と統語の相関研究 , 2012 .

[22]  Isabelle Augenstein,et al.  From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings , 2018, NAACL-HLT.

[23]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[24]  Mirella Lapata,et al.  Cross-linguistic Projection of Role-Semantic Information , 2005, HLT/EMNLP.

[25]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[26]  Trevor Cohn,et al.  Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary , 2017, ACL.

[27]  Rishiraj Saha Roy,et al.  Automatic Discovery of Adposition Typology , 2014, COLING.

[28]  Trevor Cohn,et al.  A Neural Network Model for Low-Resource Universal Dependency Parsing , 2015, EMNLP.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[31]  Carina Silberer,et al.  UHD: Cross-Lingual Word Sense Disambiguation Using Multilingual Co-Occurrence Graphs , 2010, *SEMEVAL.

[32]  Hinrich Schütze,et al.  Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages , 2017, EMNLP.

[33]  D. Bakker,et al.  Language sampling , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[34]  W. Bruce Croft Typology and Universals , 1990 .

[35]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[36]  Matthew S. Dryer,et al.  Why statistical universals are better than absolute universals , 1998 .

[37]  Anna Korhonen,et al.  Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction , 2018, TACL.

[38]  Chris Kayne Richard Collins,et al.  Syntactic Structures of the World's Languages (SSWL) , 2009 .

[39]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[40]  Khalil Sima'an,et al.  Universal Reordering via Linguistic Typology , 2016, COLING.

[41]  David Lightfoot,et al.  Explaining language change: an evolutionary approach , 2002 .

[42]  Barbara Plank,et al.  Multilingual Projection for Parsing Truly Low-Resource Languages , 2016, TACL.

[43]  Joseph H. Greenberg,et al.  Synchronic and Diachronic Universals in Phonology , 1966 .

[44]  Hans-Heinrich Lieb Universals of language , 1974 .

[45]  Hoifung Poon,et al.  Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision , 2018, EMNLP.

[46]  Jörg Tiedemann,et al.  Continuous multilinguality with language vectors , 2016, EACL.

[47]  Pushpak Bhattacharyya,et al.  Together We Can: Bilingual Bootstrapping for WSD , 2011, ACL.

[48]  Fei Xia,et al.  Comparing Language Similarity across Genetic and Typologically-Based Groupings , 2010, COLING.

[49]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[50]  Jan Niehues,et al.  Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[51]  Emily M. Bender Linguistically Naïve != Language Independent: Why NLP Needs Linguistic Typology , 2009 .

[52]  Michael Cysouw,et al.  Lexical typology through similarity semantics: Toward a semantic map of motion verbs , 2012, Linguistics.

[53]  Xiaojun Wan,et al.  Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning , 2016, ACL.

[54]  Goran Glavas,et al.  Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization , 2018, EMNLP.

[55]  Regina Barzilay,et al.  Multi-Event Extraction Guided by Global Constraints , 2012, NAACL.

[56]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[57]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[58]  M. Haspelmath,et al.  Pre-established categories don't exist: Consequences for language description and typology , 2007 .

[59]  Jörg Tiedemann,et al.  Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets , 2014, EMNLP 2014.

[60]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[61]  Jason Eisner,et al.  The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages , 2016, TACL.

[62]  R. D'Andrade The Development of Cognitive Anthropology , 1995 .

[63]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[64]  Ivan Vulic,et al.  Survey on the Use of Typological Information in Natural Language Processing , 2016, COLING.

[65]  Christopher D. Manning,et al.  Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning , 2013, ArXiv.

[66]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[67]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[68]  Dirk Hovy,et al.  If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages , 2015, ACL.

[69]  Leonard Talmy,et al.  Path to Realization: A Typology of Event Conflation , 1991 .

[70]  Jason Eisner,et al.  Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning , 2017, TACL.

[71]  Rudolf Rosa,et al.  KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[72]  W. Bruce Croft,et al.  Inferring universals from grammatical variation: Multidimensional scaling for typological analysis , 2008 .

[73]  R. Dixon Where Have all the Adjectives Gone , 1977 .

[74]  Zeljko Agic,et al.  Cross-Lingual Parser Selection for Low-Resource Languages , 2017, UDW@NoDaLiDa.

[75]  Marie-Francine Moens,et al.  Identifying Word Translations from Comparable Corpora Using Latent Topic Models , 2011, ACL.

[76]  Marie-Francine Moens,et al.  Bilingual Word Embeddings from Non-Parallel Document-Aligned Data Applied to Bilingual Lexicon Induction , 2015, ACL.

[77]  Anna Korhonen,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017, TACL.

[78]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[79]  Anna Korhonen,et al.  Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP , 2018, ACL.

[80]  Jörg Tiedemann,et al.  Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels , 2015, DepLing.

[81]  Hiroya Takamura,et al.  Discriminative Analysis of Linguistic Features for Typological Study , 2016, LREC.

[82]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[83]  P. Lewis Ethnologue : languages of the world , 2009 .

[84]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[85]  Andrei Popescu-Belis,et al.  Multilingual Hierarchical Attention Networks for Document Classification , 2017, IJCNLP.

[86]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[87]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[88]  Patrick Schone,et al.  Language-independent Induction of Part of Speech Class Labels Using Only Language Universals , 2001, IJCAI 2001.

[89]  Robert Östling,et al.  Word Order Typology through Multilingual Word Alignment , 2015, ACL.

[90]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[91]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[92]  P. Kay,et al.  Basic Color Terms: Their Universality and Evolution , 1973 .

[93]  Sebastian Ruder,et al.  A survey of cross-lingual embedding models , 2017, ArXiv.

[94]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[95]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[96]  Tingting He,et al.  A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data , 2015, IJCAI.

[97]  Graham Neubig,et al.  Learning Language Representations for Typology Prediction , 2017, EMNLP.

[98]  W. Bruce Croft,et al.  Autonomy and Functionalist Linguistics , 1995 .

[99]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[100]  M. Haspelmath,et al.  Optimality and diachronic adaptation , 1999 .

[101]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[102]  Hugo Larochelle,et al.  Learning Multilingual Word Representations using a Bag-of-Words Autoencoder , 2014, ArXiv.

[103]  François Yvon,et al.  Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning , 2014, EMNLP.

[104]  Emily M. Bender Linguistic typology in natural language processing , 2016 .

[105]  Anders Søgaard,et al.  Simple task-specific bilingual word embeddings , 2015, NAACL.

[106]  David Yarowsky,et al.  Cross-lingual Dependency Parsing Based on Distributed Representations , 2015, ACL.

[107]  Allon J. Uhlmann Coevolution: genes, culture and human diversity , 1993 .

[108]  Nasredine Semmar,et al.  Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks , 2016, COLING.

[109]  Xia Lu,et al.  Exploring Word Order Universals: a Probabilistic Graphical Model Approach , 2013, ACL.

[110]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[111]  Ryan Cotterell,et al.  A Deep Generative Model of Vowel Formant Typology , 2018, NAACL.

[112]  Balthasar Bickel,et al.  Distributional typology: statistical inquiries into the dynamics of linguistic diversity , 2015 .

[113]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[114]  Sonia Cristofaro,et al.  Introduzione alla tipologia linguistica , 1999 .

[115]  S. Levinson,et al.  The myth of language universals: language diversity and its importance for cognitive science. , 2009, The Behavioral and brain sciences.

[116]  Boris Katz,et al.  Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL , 2015, CoNLL.

[117]  J. Song The Oxford Handbook of Linguistic Typology , 2010 .

[118]  Emily M. Bender Language CoLLAGE: Grammatical Description with the LinGO Grammar Matrix , 2014, LREC.

[119]  Andrea Esuli,et al.  Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification , 2017, ERCIM News.

[120]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[121]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[122]  Ryan Cotterell,et al.  Probabilistic Typology: Deep Generative Models of Vowel Inventories , 2017, ACL.

[123]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[124]  Dan Klein,et al.  Syntactic Transfer Using a Bilingual Lexicon , 2012, EMNLP-CoNLL.

[125]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[126]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[127]  Anna Korhonen,et al.  Decoding Sentiment from Distributed Representations of Sentences , 2017, *SEMEVAL.

[128]  Regina Barzilay,et al.  Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing , 2015, EMNLP.

[129]  André F. T. Martins,et al.  Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies , 2015, ACL.

[130]  William Croft,et al.  Linguistic Typology meets Universal Dependencies , 2017, TLT.

[131]  Graeme Hirst,et al.  Bayesian Analysis in Natural Language Processing Shay Cohen (University of Edinburgh)Morgan & Claypool (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 35), 2016, xxvii+246 pp; paperback, ISBN 9781627058735, $85.00; ebook, ISBN 9781627054218, $68.00; doi: 10.2200/S0 , 2018 .

[132]  Ian Maddieson,et al.  LAPSyd: lyon-albuquerque phonological systems database , 2013, INTERSPEECH.

[133]  Emily M. Bender Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .

[134]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[135]  Miriam Van Staden,et al.  The semantic categories of cutting and breaking events: A crosslinguistic perspective , 2007 .

[136]  Eric P. Xing,et al.  Deep Neural Networks with Massive Learned Knowledge , 2016, EMNLP.

[137]  Boris Katz,et al.  Reconstructing Native Language Typology from Foreign Language Usage , 2014, CoNLL.

[138]  김상혁 영어의 능격성(Ergativity) , 2003 .

[139]  Anna Korhonen,et al.  On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling , 2018, EMNLP.

[140]  Haitao Liu,et al.  Dependency direction as a means of word-order typology: A method based on dependency treebanks , 2010 .

[141]  Martine De Cock,et al.  ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation , 2011, ACL.

[142]  Guillaume Lample,et al.  Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning , 2016, NAACL.

[143]  Anna Korhonen,et al.  Morph-fitting: Fine-Tuning Word Vector Spaces with Simple Language-Specific Rules , 2017, ACL.

[144]  Roi Reichart,et al.  Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance , 2018, EMNLP.

[145]  Benjamin Snyder,et al.  Unsupervised multilingual learning , 2010 .

[146]  Ivan Titov,et al.  Cross-lingual Transfer of Semantic Role Labeling Models , 2013, ACL.

[147]  Min Xiao,et al.  Distributed Word Representation Learning for Cross-Lingual Dependency Parsing , 2014, CoNLL.

[148]  Anders Søgaard,et al.  An Empirical Etudy of Non-Lexical Extensions to Delexicalized Transfer , 2012, COLING.

[149]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[150]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[151]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[152]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[153]  Mirella Lapata,et al.  Cross-lingual Annotation Projection for Semantic Roles , 2009, J. Artif. Intell. Res..

[154]  Hiroshi Kanayama,et al.  Learning Crosslingual Word Embeddings without Bilingual Corpora , 2016, EMNLP.

[155]  Emily M. Bender,et al.  Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties , 2013, LaTeCH@ACL.

[156]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[157]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[158]  M. Verhoef Language diversity in South Africa Linguistic atlas of South Africa: language in space and time, I.J. van der Merwe & J.H. van der Merwe : book review , 2007 .

[159]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[160]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[161]  Roi Reichart,et al.  Bridging Languages through Images with Deep Partial Canonical Correlation Analysis , 2018, ACL.

[162]  Manaal Faruqui,et al.  Cross-lingual Models of Word Embeddings: An Empirical Comparison , 2016, ACL.