论文信息 - Data Augmentation via Dependency Tree Morphing for Low-Resource Languages

Data Augmentation via Dependency Tree Morphing for Low-Resource Languages

Neural NLP systems achieve high scores in the presence of sizable training dataset. Lack of such datasets leads to poor system performances in the case low-resource languages. We present two simple text augmentation techniques using dependency trees, inspired from image processing. We “crop” sentences by removing dependency links, and we “rotate” sentences by moving the tree fragments around the root. We apply these techniques to augment the training sets of low-resource languages in Universal Dependencies project. We implement a character-level sequence tagging model and evaluate the augmented datasets on part-of-speech tagging task. We show that crop and rotate provides improvements over the models trained with non-augmented data for majority of the languages, especially for languages with rich case marking systems.

Mark Steedman | Gözde Gül Sahin | Mark Steedman

[1] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[2] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3] Richard M. Schwartz,et al. Two-Stage Data Augmentation for Low-Resourced Speech Recognition , 2016, INTERSPEECH.

[4] Daphne Koller,et al. Sentence Simplification for Semantic Role Labeling , 2008, ACL.

[5] Christof Monz,et al. Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[6] Mark J. F. Gales,et al. Data augmentation for low resource languages , 2014, INTERSPEECH.

[7] Percy Liang,et al. Data Recombination for Neural Semantic Parsing , 2016, ACL.

[8] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Richard Futrell,et al. Quantifying Word Order Freedom in Dependency Corpora , 2015, DepLing.

[10] Luc Van Gool,et al. Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection , 2016, ArXiv.

[11] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.

[12] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.