Multi-lingual Geoparsing based on Machine Translation

Our method for multi-lingual geoparsing uses monolingual tools and resources along with machine translation and alignment to return location words in many languages. Not only does our method save the time and cost of developing geoparsers for each language separately, but also it allows the possibility of a wide range of language capabilities within a single interface. We evaluated our method in our LanguageBridge prototype on location named entities using newswire, broadcast news and telephone conversations in English, Arabic and Chinese data from the Linguistic Data Consortium (LDC). Our results for geoparsing Chinese and Arabic text using our multi-lingual geoparsing method are comparable to our results for geoparsing English text with our English tools. Furthermore, experiments using our machine translation approach results in accuracy comparable to results from the same data that was translated manually.

[1]  César de Pablo-Sánchez,et al.  Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining , 2012, Knowledge and Information Systems.

[2]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[3]  Fredric C. Gey,et al.  GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview , 2008, CLEF.

[4]  Robert E. Frederking,et al.  SYNERGY: A Named Entity Recognition System for Resource-scarce Languages such as Swahili using Online Machine Translation , 2010 .

[5]  V. K. Logacheva A method for generating rules for cross-lingual transliteration , 2011, Automatic Documentation and Mathematical Linguistics.

[6]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[7]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[8]  Pu-Jen Cheng,et al.  To translate or not to translate? , 2010, SIGIR.

[9]  Philipp Koehn,et al.  Predicting Success in Machine Translation , 2008, EMNLP.

[10]  Bruno Pouliquen,et al.  Geographical information recognition and visualization in texts written in various languages , 2004, SAC '04.

[11]  Shingo Kuroiwa,et al.  A Low Cost Machine Translation Method for Cross-Lingual Information Retrieval , 2008, Eng. Lett..

[12]  Andy Way,et al.  Improved Named Entity Recognition using Machine Translation-based Cross-lingual Information , 2016, Computación y Sistemas.

[13]  Philip Resnik,et al.  Word-level Alignment for Multilingual Resource Acquisition , 2002 .

[14]  Stéphane Clinchant,et al.  Domain Adaptation of Statistical Machine Translation Models with Monolingual Data for Cross Lingual Information Retrieval , 2013, ECIR.

[15]  Fredric C. Gey Research to Improve Cross-Language Retrieval - Position Paper for CLEF , 2000, CLEF.

[16]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[17]  Utpal Garain,et al.  Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[18]  Denis Maurel,et al.  Textual Similarity based on Proper Names , 2002 .

[19]  Vasudeva Varma,et al.  A Language-Independent Approach to Identify the Named Entities in Under-Resourced Languages and Clustering Multilingual Documents , 2011, CLEF.

[20]  Hiyan Alshawi,et al.  Learning dependency transduction models from unannotated examples , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[21]  Xiaodong He Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation , 2007, WMT@ACL.

[22]  Wanxiang Che,et al.  Effective Bilingual Constraints for Semi-Supervised Learning of Named Entity Recognizers , 2013, AAAI.

[23]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[24]  Judith Gelernter,et al.  Cross-lingual geo-parsing for non-structured data , 2013, GIR '13.

[25]  Manaal Faruqui "Translation can't change a name": Using Multilingual Data for Named Entity Recognition , 2014, ArXiv.

[26]  Oscar Täckström Nudging the Envelope of Direct Transfer Methods for Multilingual Named Entity Recognition , 2012, HLT-NAACL 2012.

[27]  German Rigau,et al.  Robust multilingual Named Entity Recognition with shallow semi-supervised features , 2016, Artif. Intell..

[28]  Ralf Steinberger,et al.  Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection , 2011, RANLP.