Survey on the Use of Typological Information in Natural Language Processing

In recent years linguistic typology, which classifies the world's languages according to their functional and structural properties, has been widely used to support multilingual NLP. While the growing importance of typological information in supporting multilingual tasks has been recognised, no systematic survey of existing typological resources and their use in NLP has been published. This paper provides such a survey as well as discussion which we hope will both inform and inspire future work in the area.

[1]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[2]  Dragomir R. Radev,et al.  Classifying Syntactic Regularities for Hundreds of Languages , 2016, ArXiv.

[3]  B. Bickel Typology in the 21st century: Major current developments , 2007 .

[4]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[5]  Mirella Lapata,et al.  Cross-linguistic Projection of Role-Semantic Information , 2005, HLT/EMNLP.

[6]  Anders Søgaard Data point selection for cross-language adaptation of dependency parsers , 2011, ACL.

[7]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[8]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[9]  Anna Korhonen,et al.  Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints , 2013, NAACL.

[10]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[11]  Regina Barzilay,et al.  Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings , 2016, NAACL.

[12]  Dan Roth,et al.  Confidence Driven Unsupervised Semantic Parsing , 2011, ACL.

[13]  Thomas Mayer,et al.  Language comparison through sparse multilingual word alignment , 2012, EACL 2012.

[14]  Guillaume Lample,et al.  Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning , 2016, NAACL.

[15]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[16]  Hal Daumé,et al.  A Bayesian Model for Discovering Typological Implications , 2007, ACL.

[17]  Regina Barzilay,et al.  Learning to Map into a Universal POS Tagset , 2012, EMNLP-CoNLL.

[18]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[19]  Chris Brew,et al.  A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources , 2004, EMNLP.

[20]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[21]  Pushpak Bhattacharyya,et al.  Together We Can: Bilingual Bootstrapping for WSD , 2011, ACL.

[22]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[23]  Philippe Maurer,et al.  Atlas of Pidgin and Creole language structures online , 2013 .

[24]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[25]  Michael Daniel Linguistic Typology and the Study of Language , 2010 .

[26]  Andrew McCallum,et al.  Fast and Robust Joint Models for Biomedical Event Extraction , 2011, EMNLP.

[27]  Ian Maddieson,et al.  LAPSyd: lyon-albuquerque phonological systems database , 2013, INTERSPEECH.

[28]  Emily M. Bender Linguistic I Ssues in L Anguage Technology Lilt on Achieving and Evaluating Language-independence in Nlp on Achieving and Evaluating Language-independence in Nlp , 2022 .

[29]  Simone Paolo Ponzetto,et al.  Joining Forces Pays Off: Multilingual Joint Word Sense Disambiguation , 2012, EMNLP.

[30]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[31]  Rudolf Rosa,et al.  KLcpos3 - a Language Similarity Measure for Delexicalized Parser Transfer , 2015, ACL.

[32]  Ari Rappoport,et al.  Improved Fully Unsupervised Parsing with Zoomed Learning , 2010, EMNLP.

[33]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[34]  Alexander M. Rush,et al.  Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints , 2012, EMNLP-CoNLL.

[35]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[36]  Jonathan E. Clark,et al.  Fast and Robust , 2002, Int. J. Robotics Res..

[37]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[38]  Guillaume Wenzek,et al.  Trans-gram, Fast Cross-lingual Word-embeddings , 2015, EMNLP.

[39]  Regina Barzilay,et al.  Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing , 2015, EMNLP.

[40]  Anna Korhonen,et al.  An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.

[41]  Emily M. Bender,et al.  Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties , 2013, LaTeCH@ACL.

[42]  Xuanjing Huang,et al.  Investigating Language Universal and Specific Properties in Word Embeddings , 2016, ACL.

[43]  Dik Bakker LINFER: inferring implications from the WALS database , 2008 .

[44]  Regina Barzilay,et al.  Multi-Event Extraction Guided by Global Constraints , 2012, NAACL.

[45]  Noah A. Smith,et al.  Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance , 2011, EMNLP.

[46]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[47]  Boris Katz,et al.  Reconstructing Native Language Typology from Foreign Language Usage , 2014, CoNLL.

[48]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[49]  Noah A. Smith,et al.  Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[50]  Fei Xia,et al.  Automatically Identifying Computationally Relevant Typological Features , 2008, IJCNLP.

[51]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[52]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[53]  Robert Östling,et al.  Word Order Typology through Multilingual Word Alignment , 2015, ACL.

[54]  Marie-Francine Moens,et al.  Bilingual Distributed Word Representations from Document-Aligned Comparable Data , 2015, J. Artif. Intell. Res..

[55]  Fei Xia,et al.  Comparing Language Similarity across Genetic and Typologically-Based Groupings , 2010, COLING.

[56]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[57]  Joakim Nivre,et al.  Target Language Adaptation of Discriminative Transfer Parsers , 2013, NAACL.

[58]  Hiroya Takamura,et al.  Discriminative Analysis of Linguistic Features for Typological Study , 2016, LREC.

[59]  P Karatsareas Syntactic Structures of the World’s Languages – Greek (Cappadocian) , 2015 .

[60]  Valentin I. Spitkovsky,et al.  Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction , 2011, EMNLP.

[61]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[62]  Benjamin Snyder,et al.  Unsupervised multilingual learning , 2010 .

[63]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[64]  Alexander M. Rush,et al.  On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing , 2010, EMNLP.

[65]  Boris Katz,et al.  Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL , 2015, CoNLL.

[66]  Roi Reichart,et al.  Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling , 2015 .

[67]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[68]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[69]  Dan Klein,et al.  Phylogenetic Grammar Induction , 2010, ACL.

[70]  Kevin Knight,et al.  Grapheme-to-Phoneme Models for (Almost) Any Language , 2016, ACL.

[71]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[72]  Shashi Narayan,et al.  Encoding Prior Knowledge with Eigenword Embeddings , 2015, TACL.

[73]  François Yvon,et al.  Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning , 2014, EMNLP.

[74]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[75]  Yan Song,et al.  Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties , 2014, LREC.