Dependency Parsing of Code-Switching Data with Cross-Lingual Feature Representations

This paper describes the test of a dependency parsing method which is based on bidirectional LSTM feature representations and multilingual word embedding, and evaluates the results on mono-and multilingual data. The results are similar in all cases, with a slightly better results achieved using multilingual data. The languages under investigation are Komi-Zyrian and Russian. Examination of the results by relation type shows that some language specific constructions are correctly recognized even when they appear in naturally occurring code-switching data. Tiivistelma Tutkimus arvioi dependenssianalyysin menetelmaa, joka perustuu kaksisuun-taiseen LSTM-piirrerepresentaatioon ja monikieliseen 'word embedding'-malliin, seka arvioi tuloksia yksi-ja monikielisissa aineistoissa. Tulokset ovat samanta-paisia, mutta hieman korkeampia moni-kuin yksikielisissa aineistoissa. Tutkitut kielet ovat komisyrjaani ja venaja. Tulosten yksityiskohtaisempi analyysi riippu-vuuksien mukaan osoittaa, etta tietyt kielikohtaiset suhteet on tunnistettu oikein jopa niiden esiintyessa luonnollisissa koodinvaihtoa sisaltavissa lauseissa. This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details:

[1]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[2]  Jörg Tiedemann,et al.  Rediscovering Annotation Projection for Cross-Lingual Parser Induction , 2014, COLING.

[3]  David Yarowsky,et al.  Cross-lingual Dependency Parsing Based on Distributed Representations , 2015, ACL.

[4]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[5]  Ciprian Gerstenberger,et al.  Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region , 2017 .

[6]  David Yarowsky,et al.  Multipath Translation Lexicon Induction via Bridge Languages , 2001, NAACL.

[7]  Thierry Poibeau,et al.  A System for Multilingual Dependency Parsing based on Bidirectional LSTM Feature Representations , 2017, CoNLL Shared Task.

[8]  Janurik Boglárka Erzya–Russian bilingual discourse: A structural analysis of intrasentential code-switching patterns , 2017 .

[9]  Yuji Matsumoto,et al.  Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data , 2017 .

[10]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[11]  Noah A. Smith,et al.  One Parser, Many Languages , 2016, ArXiv.

[12]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[13]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[14]  Kai Liu,et al.  Bilingually-Guided Monolingual Dependency Grammar Induction , 2013, ACL.

[15]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[16]  Kyunghyun Cho,et al.  Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[17]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[18]  Joachim Wagner,et al.  Code Mixing: A Challenge for Language Identification in the Language of Social Media , 2014, CodeSwitch@EMNLP.

[19]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[20]  Julia Hirschberg,et al.  Overview for the First Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[21]  Joshua Wilbur,et al.  Utilizing Language Technology in the Documentation of Endangered Uralic Languages , 2016 .

[22]  David Yarowsky,et al.  A Representation Learning Framework for Multi-Source Transfer Parsing , 2016, AAAI.

[23]  Eneko Agirre,et al.  Learning principled bilingual mappings of word embeddings while preserving monolingual invariance , 2016, EMNLP.

[24]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[25]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[26]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[27]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.