Metadata records machine translation combining multi‐engine outputs with limited parallel data

One way to facilitate Multilingual Information Access (MLIA) for digital libraries is to generate multilingual metadata records by applying Machine Translation (MT) techniques. Current online MT services are available and affordable, but are not always effective for creating multilingual metadata records. In this study, we implemented 3 different MT strategies and evaluated their performance when translating English metadata records to Chinese and Spanish. These strategies included combining MT results from 3 online MT systems (Google, Bing, and Yahoo!) with and without additional linguistic resources, such as manually‐generated parallel corpora, and metadata records in the two target languages obtained from international partners. The open‐source statistical MT platform Moses was applied to design and implement the three translation strategies. Human evaluation of the MT results using adequacy and fluency demonstrated that two of the strategies produced higher quality translations than individual online MT systems for both languages. Especially, adding small, manually‐generated parallel corpora of metadata records significantly improved translation performance. Our study suggested an effective and efficient MT approach for providing multilingual services for digital collections.

[1]  Rohit Prasad,et al.  Batch-mode semi-supervised active learning for statistical machine translation , 2013, Comput. Speech Lang..

[2]  Emily C. Weyant Multilingual Access and Services for Digital Collections , 2020 .

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  PedestrianEDESTRIAN FacilitiesACILITIES Chapter 16 , 1998 .

[5]  Diane I. Hillmann,et al.  Metadata Quality: From Evaluation to Augmentation , 2008 .

[6]  Forbes Gibb,et al.  Usability and impact of digital libraries: a review , 2006, Online Inf. Rev..

[7]  Michelle Wendy Tan,et al.  Building language resources for a Multi-Engine English-Filipino machine translation system , 2008, Lang. Resour. Evaluation.

[8]  Alon Lavie,et al.  Combining Machine Translation Output with Open Source: The Carnegie Mellon Multi-Engine Machine Translation Scheme , 2010, Prague Bull. Math. Linguistics.

[9]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[10]  Rosario Rogel Salazar,et al.  Red de Revistas Científicas de América Latina y el Caribe, España Portugal (Redalyc) , 2014 .

[11]  Noriko Kando,et al.  Overview of the NTCIR-7 ACLIA IR4QA Task , 2008, NTCIR.

[12]  D. W. Barron Machine Translation , 1968, Nature.

[13]  Jiangping Chen,et al.  Information access across languages on the web: From search engines to digital libraries , 2009, ASIST.

[14]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Ruth A. Palmquist Statistical Methods for the Information Professional: A Practical, Painless Approach to Understanding, Using, and Interpreting Statistics , 2002 .

[17]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[18]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[19]  Federico Gaspari Online MT Services and Real Users? Needs: An Empirical Usability Evaluation , 2004, AMTA.

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  Ying Zhang,et al.  Domain-Specific Query Translation for Multilingual Information Access using Machine Translation Augmented With Dictionaries Mined from Wikipedia , 2008, IJCNLP.

[22]  Sneha Tripathi,et al.  Approaches to machine translation , 2010 .

[23]  Elaine Ménard TIIARA: The "making of" a bilingual taxonomy for retrieval of digital images , 2012, Libr. Hi Tech.

[24]  Jianqiang Wang,et al.  User-assisted query translation for interactive cross-language information retrieval , 2008, Inf. Process. Manag..

[25]  Krystyna K. Matusiak,et al.  Multilingual metadata for cultural heritage materials: The case of the Tse-Tsung Chow Collection of Chinese Scrolls and Fan Paintings , 2015, Electron. Libr..

[26]  R. Robertson Metadata quality: implications for library and information science professionals , 2005 .

[27]  Hans Uszkoreit,et al.  Using a new analytic measure for the annotation and analysis of MT errors on real data , 2014, EAMT.

[28]  Andrei Popescu-Belis,et al.  Principles of Context-Based Machine Translation Evaluation , 2002, Machine Translation.

[29]  Shan Jiang,et al.  A preliminary evaluation of metadata records machine translation , 2012, Electron. Libr..

[30]  Jonathan Purday Europeana: Digital Access to Europe's Cultural Heritage , 2012 .

[31]  Fuji Ren,et al.  A Multi-Engine Translation Approach to Machine Translation , 2002, Int. J. Inf. Technol. Decis. Mak..

[32]  Sergei Nirenburg,et al.  Toward Multi-Engine Machine Translation , 1994, HLT.

[33]  Preslav Nakov,et al.  Pairwise Neural Machine Translation Evaluation , 2015, ACL.