A Survey on Zero Pronoun Translation

Zero pronouns (ZPs) are frequently omitted in pro-drop languages (e.g. Chinese, Hungarian, and Hindi), but should be recalled in non-pro-drop languages (e.g. English). This phenomenon has been studied extensively in machine translation (MT), as it poses a significant challenge for MT systems due to the difficulty in determining the correct antecedent for the pronoun. This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution, so that researchers can recognise the current state and future directions of this field. We provide an organisation of the literature based on evolution, dataset, method and evaluation. In addition, we compare and analyze competing models and evaluation metrics on different benchmarks. We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use; 4) general-purpose metrics are not reliable on nuances and complexities of ZPT, emphasizing the necessity of targeted metrics; 5) apart from commonly-cited errors, ZPs will cause risks of gender bias.

[1]  Jitao Xu,et al.  New Trends in Machine Translation using Large Language Models: Case Examples with ChatGPT , 2023, ArXiv.

[2]  Zhaopeng Tu,et al.  Document-Level Machine Translation with Large Language Models , 2023, ArXiv.

[3]  Dacheng Tao,et al.  Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT , 2023, ArXiv.

[4]  Loïc Barrault,et al.  Controlling Extra-Textual Attributes about Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation , 2022, EAMT.

[5]  M. Amin Farajian,et al.  Findings of the WMT 2022 Shared Task on Chat Translation , 2022, WMT.

[6]  Philipp Koehn,et al.  Findings of the 2022 Conference on Machine Translation (WMT22) , 2022, WMT.

[7]  Lidia S. Chao,et al.  GuoFeng: A Benchmark for Zero Pronoun Recovery and Translation , 2022, EMNLP.

[8]  Kyomin Jung,et al.  Contrastive Learning for Context-aware Neural Machine Translation Using Coreference Information , 2021, WMT.

[9]  Yoshimasa Tsuruoka,et al.  Zero-pronoun Data Augmentation for Japanese-to-English Translation , 2021, WAT.

[10]  Lemao Liu,et al.  TranSmart: A Practical Interactive Machine Translation System , 2021, ArXiv.

[11]  Fajri Koto,et al.  Discourse Probing of Pretrained Language Models , 2021, NAACL.

[12]  Guodong Zhou,et al.  Coupling Context Modeling with Zero Pronoun Recovering for Document-Level Natural Language Generation , 2021, EMNLP.

[13]  Rachel Bawden,et al.  Document-level Neural MT: A Systematic Comparison , 2020, EAMT.

[14]  Shafiq R. Joty,et al.  Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses , 2020, EMNLP.

[15]  Alon Lavie,et al.  COMET: A Neural Framework for MT Evaluation , 2020, EMNLP.

[16]  Dong Yu,et al.  ZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT , 2020, ACL.

[17]  Massimo Poesio,et al.  Cross-lingual Zero Pronoun Resolution , 2020, LREC.

[18]  Eunjeong Lucy Park,et al.  Revisiting Round-trip Translation for Quality Estimation , 2020, EAMT.

[19]  Chris Dyer,et al.  Better Document-Level Machine Translation with Bayes’ Rule , 2019, Transactions of the Association for Computational Linguistics.

[20]  Shuming Shi,et al.  Tencent Neural Machine Translation Systems for the WMT20 News Translation Task , 2020, WMT.

[21]  Masaaki Nagata,et al.  Context-aware Neural Machine Translation with Coreference Information , 2019, EMNLP.

[22]  Naoki Yoshinaga,et al.  Data augmentation using back-translation for context-aware neural machine translation , 2019, EMNLP.

[23]  Deyi Xiong,et al.  Detecting and Translating Dropped Pronouns in Neural Machine Translation , 2019, NLPCC.

[24]  Rico Sennrich,et al.  Context-Aware Monolingual Repair for Neural Machine Translation , 2019, EMNLP.

[25]  Xing Wang,et al.  One Model to Learn Both: Zero Pronoun Prediction and Translation , 2019, EMNLP.

[26]  Kevin Gimpel,et al.  Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations , 2019, EMNLP.

[27]  Maosong Sun,et al.  Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach , 2019, ACL.

[28]  Andy Way,et al.  Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation , 2019, MTSummit.

[29]  Cheng Niu,et al.  Improving Multi-turn Dialogue Modelling with Utterance ReWriter , 2019, ACL.

[30]  Jun Guo,et al.  Recovering dropped pronouns in Chinese conversations via modeling their referents , 2019, NAACL.

[31]  Markus Freitag,et al.  APE at Scale and Its Implications on MT Evaluation Biases , 2019, WMT.

[32]  Wang Longyue,et al.  Discourse-aware neural machine translation , 2019 .

[33]  Takehito Utsuro,et al.  Selecting Informative Context Sentence by Forced Back-Translation , 2019, MTSummit.

[34]  Andy Way,et al.  Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism , 2018, EMNLP.

[35]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[36]  Rico Sennrich,et al.  Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation , 2018, EMNLP.

[37]  Yu Zhang,et al.  Zero Pronoun Resolution with Attention-based Neural Network , 2018, COLING.

[38]  Shuming Shi,et al.  Translating Pro-Drop Languages with Reconstruction Models , 2018, AAAI.

[39]  Ting Liu,et al.  Neural recovery machine for Chinese dropped pronoun , 2016, Frontiers of Computer Science.

[40]  Andrei Popescu-Belis,et al.  Validation of an Automatic Metric for the Accuracy of Pronoun Translation (APT) , 2017, DiscoMT@EMNLP.

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[42]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[43]  Ilaria Bacolini,et al.  Exploring the partial pro-drop property in modern Hebrew , 2017 .

[44]  Zhaopeng Tu,et al.  A novel and robust approach for pro-drop language translation , 2017, Machine Translation.

[45]  Chen Chen,et al.  Chinese Zero Pronoun Resolution with Deep Neural Networks , 2016, ACL.

[46]  Andy Way,et al.  Automatic Construction of Discourse Corpora for Dialogue Translation , 2016, LREC.

[47]  Andy Way,et al.  A Novel Approach to Dropped Pronoun Translation , 2016, NAACL.

[48]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[49]  Xiaojun Zhang,et al.  Dropped pronoun generation for dialogue machine translation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Munpyo Hong,et al.  Zero Object Resolution in Korean , 2015, PACLIC.

[51]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[52]  Chen Chen,et al.  Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling State-of-the-Art Resolvers , 2015, ACL.

[53]  Yalin Liu,et al.  Recovering dropped pronouns from Chinese text messages , 2015, ACL.

[54]  H. Nakaiwa,et al.  Automatic detection of antecedents of Japanese zero pronouns using a Japanese-English bilingual corpus , 2015, MTSUMMIT.

[55]  Allyson Ettinger,et al.  Dialogue focus tracking for zero pronoun resolution , 2015, NAACL.

[56]  Chen Chen,et al.  Chinese Zero Pronoun Resolution: Some Recent Advances , 2013, EMNLP.

[57]  Bowen Zhou,et al.  Enlisting the Ghost: Modeling Empty Categories for Machine Translation , 2013, ACL.

[58]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[59]  Katsuhito Sudoh,et al.  Zero Pronoun Resolution can Improve the Quality of J-E Translation , 2012, SSST@ACL.

[60]  Sharid Loáiciga,et al.  Italian and Spanish Null Subjects. A Case Study Evaluation in an MT Perspective , 2012, LREC.

[61]  Elizabeth Baran,et al.  Annotating dropped pronouns in Chinese newswire text , 2012, LREC.

[62]  Daniel Gildea,et al.  Effects of Empty Categories on Machine Translation , 2010, EMNLP.

[63]  Fang Kong,et al.  A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution , 2010, EMNLP.

[64]  Nianwen Xue,et al.  Chasing the ghost: recovering empty categories in the Chinese Treebank , 2010, COLING.

[65]  Philipp Koehn,et al.  Aiding Pronoun Translation with Co-Reference Resolution , 2010, WMT@ACL.

[66]  Simone Pereira,et al.  ZAC.PB: An Annotated Corpus for Zero Anaphora Resolution in Portuguese , 2009, RANLP.

[67]  Hwee Tou Ng,et al.  Identification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach , 2007, EMNLP.

[68]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[69]  Edith Bolling Anaphora Resolution , 2006 .

[70]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[71]  Antonio Ferrández Rodríguez,et al.  Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation , 2003, J. Artif. Intell. Res..

[72]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[73]  Rashmi Prasad,et al.  A Corpus Study of Zero Pronouns in Hindi: An Account Based on Centering Transition Preferences , 2001 .

[74]  Sandra A. Thompson,et al.  Third-person pronouns and zero-anaphora in Chinese discourse , 1979 .

[75]  Michael Halliday,et al.  Cohesion in English , 1976 .