Evaluation of an NLG System using Post-Edit Data: Lessons Learnt

Post-editing is commonly performed on computergenerated texts, whether from Machine Translation (MT) or NLG systems, to make the texts acceptable to end users. MT systems are often evaluated using post-edit data. In this paper we describe our experience of using post-edit data to evaluate SUMTIME-MOUSAM, an NLG system that produces marine weather forecasts.

[1]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[2]  Jim Hunter,et al.  Segmenting Time Series for Weather Forecasting , 2003 .

[3]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[4]  E. Reiter,et al.  Acquiring Correct Knowledge for Natural Language Generation , 2011, J. Artif. Intell. Res..

[5]  Ehud Reiter,et al.  SumTime-Mousam: Configurable marine weather forecast generator , 2003 .

[6]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[7]  Sameer Pradhan,et al.  Evaluation Metrics , 2007 .

[8]  Keh-Yih Su,et al.  A New Quantitative Quality Measure for Machine Translation Systems , 1992, COLING.

[9]  Chris Mellish,et al.  Evaluation in the context of natural language generation , 1998, Comput. Speech Lang..

[10]  Ehud Reiter,et al.  Squibs and Discussions: Human Variation and Lexical Choice , 2002, CL.

[11]  R. Michael Young,et al.  Using Grice's maxim of Quantity to select the content of plan descriptions , 1999, Artif. Intell..

[12]  R. Mitkov,et al.  Computer-Aided Generation of Multiple-Choice Tests , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[13]  James C. Lester,et al.  Developing and Empirically Evaluating Robust Explanation Generators: The KNIGHT Experiments , 1997, Comput. Linguistics.

[14]  Ehud Reiter,et al.  Learning the Meaning and Usage of Time Phrases from a Parallel Text-Data Corpus , 2003, HLT-NAACL 2003.