Metamorphic Testing for Machine Translations: MT4MT

Automated machine translation software and services have become widely available and increasingly popular. Due to the complexity and flexibility of natural languages, automated testing and quality assessment of this type of software is extremely challenging, especially in the absence of a human oracle or a reference translation. Furthermore, even if a reference translation is available, some major evaluation metrics, such as BLEU, are not reliable on short sentences, the type of sentence now prevailing on the Internet. To alleviate these problems, we have been using a metamorphic testing technique to test machine translation services in a fully automatic way without the involvement of any human assessor or reference translation. This article reports on our progress, and presents some interesting preliminary experimental results that reveal quality issues of English-to-Chinese translations in two mainstream machine translation services: Google Translate and Microsoft Translator. These preliminary results demonstrate the usefulness and potential of metamorphic testing for applications in the natural language processing domain.

[1]  Sergio Segura,et al.  Metamorphic Testing 20 Years Later: A Hands-on Introduction , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[2]  R. P. Jagadeesh Chandra Bose,et al.  Identifying implementation bugs in machine learning based image classifiers using metamorphic testing , 2018, ISSTA.

[3]  Dave Towey,et al.  Introduction to the special issue on test oracles , 2018, J. Syst. Softw..

[4]  Liqun Sun,et al.  Metamorphic testing of driverless cars , 2019, Commun. ACM.

[5]  Alastair F. Donaldson,et al.  Automated testing of graphics shader compilers , 2017, Proc. ACM Program. Lang..

[6]  The Efficacy of Round-trip Translation for MT Evaluation , 2010 .

[7]  Zhenyu Wang,et al.  Metamorphic Testing for Adobe Analytics Data Collection JavaScript Library , 2018, 2018 IEEE/ACM 3rd International Workshop on Metamorphic Testing (MET).

[8]  Tsong Yueh Chen,et al.  Metamorphic Testing for Adobe Data Analytics Software , 2017, 2017 IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET).

[9]  Tsong Yueh Chen,et al.  Fault-based testing without the need of oracles , 2003, Inf. Softw. Technol..

[10]  Dave Towey,et al.  A Monte Carlo Method for Metamorphic Testing of Machine Translation Services , 2018, 2018 IEEE/ACM 3rd International Workshop on Metamorphic Testing (MET).

[11]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[12]  Jeffrey M. Voas,et al.  Metamorphic Testing for Cybersecurity , 2016, Computer.

[13]  Tao Xie,et al.  Oracle-free Detection of Translation Issue for Neural Machine Translation , 2018, ArXiv.

[14]  Mikael Lindvall,et al.  A Metamorphic Testing Approach to NASA GMSEC's Flexible Publish and Subscribe Functionality , 2018, 2018 IEEE/ACM 3rd International Workshop on Metamorphic Testing (MET).

[15]  Alastair F. Donaldson,et al.  Metamorphic Testing for (Graphics) Compilers , 2016, 2016 IEEE/ACM 1st International Workshop on Metamorphic Testing (MET).

[16]  Mikael Lindvall,et al.  Metamorphic Model-Based Testing Applied on NASA DAT -- An Experience Report , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[17]  Tomohiro Shigenobu,et al.  Evaluation and Usability of Back Translation for Intercultural Communication , 2007, HCI.

[18]  Tsong Yueh Chen,et al.  Metamorphic Testing: A New Approach for Generating Next Test Cases , 2020, ArXiv.

[19]  Deborah A. Coughlin,et al.  Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.

[20]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[21]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[22]  Sergio Segura,et al.  A Survey on Metamorphic Testing , 2016, IEEE Transactions on Software Engineering.

[23]  Harold L. Somers,et al.  Round-trip Translation: What Is It Good For? , 2005, ALTA.

[24]  Huai Liu,et al.  Metamorphic Testing , 2018, ACM Comput. Surv..

[25]  D. Richard Kuhn,et al.  Finding Bugs in Cryptographic Hash Function Implementations , 2018, IEEE Transactions on Reliability.

[26]  THE NEW A vision and R & D update from Accenture Labs and Accenture Testing Services , .

[27]  Hwee Tou Ng,et al.  Better Evaluation Metrics Lead to Better Machine Translation , 2011, EMNLP.