Applying Machine Translation Evaluation Techniques to Textual CBR

The need for automated text evaluation is common to several AI disciplines. In this work, we explore the use of Machine Translation (MT) evaluation metrics for Textual Case Based Reasoning (TCBR). MT and TCBR typically propose textual solutions and both rely on human reference texts for evaluation purposes. Current TCBR evaluation metrics such as precision and recall employ a single human reference but these metrics are misleading when semantically similar texts are expressed with different sets of keywords. MT metrics overcome this challenge with the use of multiple human references. Here, we explore the use of multiple references as opposed to a single reference applied to incident reports from the medical domain. These references are created introspectively from the original dataset using the CBR similarity assumption. Results indicate that TCBR systems evaluated with these new metrics are closer to human judgements. The generated text in TCBR is typically similar in length to the reference since it is a revised form of an actual solution to a similar problem, unlike MT where generated texts can sometimes be significantly shorter. We therefore discovered that some parameters in the MT evaluation measures are not useful for TCBR due to the intrinsic difference in the text generation process.

[1]  Mario Lenz,et al.  Case Retrieval Nets: Basic Ideas and Extensions , 1996, KI.

[2]  Mario Lenz,et al.  Textual CBR and Information Retrieval -- A Comparison , 1998 .

[3]  Barry Smyth,et al.  Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[4]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[5]  Anja Belz,et al.  Statistical Generation: Three Methods Compared and Evaluated , 2005, ENLG.

[6]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .

[7]  Kevin D. Ashley,et al.  Evaluation of Textual CBR Approaches , 1998 .

[8]  Luc Lamontagne,et al.  Textual Reuse for Email Response , 2004, ECCBR.

[9]  Luc Lamontagne,et al.  Case Retrieval Reuse Net (CR2N): An Architecture for Reuse of Textual Solutions , 2009, ICCBR.

[10]  Kevin D. Ashley,et al.  Reasoning with Textual Cases , 2005, ICCBR.

[11]  Nirmalie Wiratunga,et al.  Solution reuse for textual cases , 2008 .

[12]  Luc Lamontagne,et al.  Using Statistical Word Associations for the Retrieval of Strongly-Textual Cases , 2003, FLAIRS Conference.

[13]  John S. White,et al.  The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches , 1994, AMTA.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[16]  Kevin D. Ashley,et al.  Textual case-based reasoning , 2005, Knowl. Eng. Rev..

[17]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[18]  Luc Lamontagne,et al.  Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[19]  Günther Görz,et al.  KI-96: Advances in Artificial Intelligence , 1996, Lecture Notes in Computer Science.

[20]  Ehud Reiter,et al.  Evaluation of an NLG System using Post-Edit Data: Lessons Learnt , 2005, ENLG.