论文信息 - Task-Based Evaluation of NLG Systems: Control vs Real-World Context

Task-Based Evaluation of NLG Systems: Control vs Real-World Context

Currently there is little agreement about, or even discussion of, methodologies for task-based evaluation of NLG systems. I discuss one specific issue in this area, namely the importance of control vs the importance of ecological validity (real-world context), and suggest that perhaps we need to put more emphasis on ecological validity in NLG evaluations.

Ehud Reiter

[1] R. Michael Young,et al. Using Grice's maxim of Quantity to select the content of plan descriptions , 1999, Artif. Intell..

[2] Jim Hunter,et al. Automatic Generation of Textual Summaries from Neonatal Intensive Care Data , 2007, AIME.

[3] Albert Gatt,et al. BT-Nurse: computer generation of natural language shift summaries from complex heterogeneous medical data , 2011, J. Am. Medical Informatics Assoc..

[4] Albert Gatt,et al. From data to text in the Neonatal Intensive Care Unit: Using NLG technology for decision support and information management , 2009, AI Commun..

[5] Ehud Reiter,et al. Lessons from a failure: Generating tailored smoking cessation letters , 2003, Artif. Intell..

[6] P. Donnan,et al. Cost effectiveness of computer tailored and non-tailored smoking cessation letters in general practice: randomised controlled trial , 2001, BMJ : British Medical Journal.

[7] Davide Fossati,et al. Aggregation Improves Learning: Experiments in Natural Language Generation for Intelligent Tutoring Systems , 2005, ACL.

[8] R. Logie,et al. When a graph is poorer than 100 words: A comparison of computerised natural language generation, human generated descriptions and graphical displays in neonatal intensive care , 2010 .

[9] Catherine Plaisant,et al. The challenge of information visualization evaluation , 2004, AVI.