Would it be helpful or detrimental for the field of NLG to have a generally accepted competition? Competitions have definitely advanced the state of the art in some fields of NLP, but the benefits sometimes come at the price of over-competitiveness, and there is a danger of overfitting systems to the concrete evaluation metrics. Moreover, it has been argued that there are intrinsic difficulties in NLG that make it harder to evaluate than other NLP tasks (Scott and Moore, 2006). We agree that NLG is too diverse for a single “competition”, and there are no mutually accepted evaluation metrics. Instead, we suggest that all the positive aspects, and only a few of the negative ones, can be achieved by putting forth a challenge to the community. Research teams would implement systems that address various aspects of the challenge. These systems would then be evaluated regularly, and the results compared at a workshop. There would be no “winner” in the sense of a competition; rather, the focus should be on learning what works and what doesn’t, building upon the best ideas, and perhaps reusing the best modules for next year’s round. As a side effect, the exercise should result in a growing body of shareable tools and modules.
[1]
Ido Dagan,et al.
The Third PASCAL Recognizing Textual Entailment Challenge
,
2007,
ACL-PASCAL@ACL.
[2]
W. Lewis Johnson,et al.
STEVE: A Pedagogical Agent for Virtual Reality.
,
1998
.
[3]
Eric Fosler-Lussier,et al.
Sentence Planning for Realtime Navigational Instruction
,
2006,
HLT-NAACL.
[4]
Ralph Debusmann,et al.
Put My Galakmid Coin into the Dispenser and Kick It: Computational Linguistics and Theorem Proving in a Computer Game
,
2004,
J. Log. Lang. Inf..