Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging

Current supervised name tagging approaches are inadequate for most low-resource languages due to the lack of annotated data and actionable linguistic knowledge. All supervised learning methods (including deep neural networks (DNN)) are sensitive to noise and thus they are not quite portable without massive clean annotations. We found that the F-scores of DNN-based name taggers drop rapidly (20%-30%) when we replace clean manual annotations with noisy annotations in the training data. We propose a new solution to incorporate many non-traditional language universal resources that are readily available but rarely explored in the Natural Language Processing (NLP) community, such as the World Atlas of Linguistic Structure, CIA names, PanLex and survival guides. We acquire and encode various types of non-traditional linguistic resources into a DNN name tagger. Experiments on three low-resource languages show that feeding linguistic knowledge can make DNN significantly more robust to noise, achieving 8%-22% absolute F-score gains on name tagging without using any human annotation

[1]  Heng Ji,et al.  Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning , 2016, HLT-NAACL.

[2]  Robert Forkel,et al.  The World Atlas of Language Structures Online , 2009 .

[3]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[4]  Gary Geunbae Lee,et al.  Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web , 2003, ACL.

[5]  Joel Nothman,et al.  Classifying articles in English and German Wikipedia , 2009, ALTA.

[6]  Kristina Toutanova,et al.  Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia , 2012, ACL.

[7]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[8]  Ralph Grishman,et al.  Joint Event Extraction via Recurrent Neural Networks , 2016, NAACL.

[9]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[10]  Nikolaos Avouris,et al.  Machine Learning algorithms : a study on noise sensitivity , 2003 .

[11]  Mark G. Lee,et al.  Mapping Arabic Wikipedia into the Named Entities Taxonomy , 2012, COLING.

[12]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[13]  Jonathan Pool,et al.  PanLex: Building a Resource for Panlingual Lexical Translation , 2014, LREC.

[14]  Chris Dyer,et al.  Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik , 2016, COLING.

[15]  Lori Levin,et al.  Design and implementation of controlled elicitation for machine translation of low-density languages , 2001, MTSUMMIT.

[16]  Xiaocheng Feng,et al.  A language-independent neural network for event detection , 2016, Science China Information Sciences.

[17]  Gerard de Melo Etymological Wordnet: Tracing The History of Words , 2014, LREC.

[18]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[19]  Wisam Dakka,et al.  Augmenting Wikipedia with Named Entity Tags , 2008, IJCNLP.

[20]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[21]  Ivan Vulic,et al.  Survey on the Use of Typological Information in Natural Language Processing , 2016, COLING.

[22]  Kentaro Inui,et al.  Neural Architectures for Fine-grained Entity Type Classification , 2016, EACL.

[23]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[24]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[25]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[26]  Timothy Baldwin,et al.  PanLex and LEXTRACT: Translating all Words of all Languages of the World , 2010, COLING.

[27]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[28]  Stephen D. Mayhew,et al.  Cross-Lingual Named Entity Recognition via Wikification , 2016, CoNLL.

[29]  Christopher D. Manning,et al.  Cross-lingual Projected Expectation Regularization for Weakly Supervised Learning , 2014, TACL.

[30]  Gerard de Melo Lexvo.org: Language-related information for the Linguistic Linked Data cloud , 2015, Semantic Web.

[31]  Robert Östling,et al.  Word Order Typology through Multilingual Word Alignment , 2015, ACL.

[32]  Ralph Grishman,et al.  Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[33]  Taraka Rama,et al.  How Good are Typological Distances for Determining Genealogical Relationships among Languages? , 2012, COLING.

[34]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[35]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[36]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[37]  Heng Ji,et al.  Joint bilingual name tagging for parallel corpora , 2012, CIKM '12.

[38]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[39]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[40]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[41]  Wanxiang Che,et al.  Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition , 2013, ACL.

[42]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[43]  Heng Ji,et al.  A Dependency-Based Neural Network for Relation Classification , 2015, ACL.

[44]  Udo Kruschwitz,et al.  Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia , 2014, EACL.

[45]  Joel Nothman,et al.  Transforming Wikipedia into Named Entity Training Data , 2008, ALTA.

[46]  Tom Bylander,et al.  Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[47]  Clare R. Voss,et al.  ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering , 2015, KDD.

[48]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[49]  Bruno Pouliquen,et al.  JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource , 2011, RANLP.

[50]  Lori Levin,et al.  Semi-Automated Elicitation Corpus Generation , 2005 .

[51]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[52]  Shikhar Kr. Sarma,et al.  Building Multilingual Lexical Resources using Wordnets: Structure, Design and Implementation , 2012 .

[53]  Francis Bond,et al.  A Survey of WordNets and their Licenses , 2011 .

[54]  Gerhard Weikum,et al.  Towards Universal Multilingual Knowledge Bases , 2010 .

[55]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[56]  Yugo Murawaki,et al.  Contrasting Vertical and Horizontal Transmission of Typological Features , 2016, COLING.

[57]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[58]  Yunhai Tong,et al.  A Position Encoding Convolutional Neural Network Based on Dependency Tree for Relation Classification , 2016, EMNLP.

[59]  Jun Zhao,et al.  Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.

[60]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Heng Ji,et al.  Unsupervised Entity Linking with Abstract Meaning Representation , 2015, NAACL.