SemEval-2014 Task 2: Grammar Induction for Spoken Dialogue Systems

In this paper we present the SemEval2014 Task 2 on spoken dialogue grammar induction. The task is to classify a lexical fragment to the appropriate semantic category (grammar rule) in order to construct a grammar for spoken dialogue systems. We describe four subtasks covering two languages, English and Greek, and three speech application domains, travel reservation, tourism and finance. The classification results are compared against the groundtruth. Weighted and unweighted precision, recall and fmeasure are reported. Three sites participated in the task with five systems, employing a variety of features and in some cases using external resources for training. The submissions manage to significantly beat the baseline, achieving a f-measure of 0.69 in comparison to 0.56 for the baseline, averaged across all subtasks.

[1]  Alexandros Potamianos,et al.  Web data harvesting for speech understanding grammar induction , 2013, INTERSPEECH.

[2]  Tatsuya Kawahara,et al.  A bootstrapping approach for developing language model of new spoken dialogue systems by selecting web texts , 2006, INTERSPEECH.

[3]  Alexandros Potamianos,et al.  A soft-clustering algorithm for automatic induction of semantic classes , 2007, INTERSPEECH.

[4]  Helen M. Meng,et al.  Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[6]  Matthew Purver,et al.  Investigating the Contribution of Distributional Semantic Information for Dialogue Act Classification , 2014, CVSC@EACL.

[7]  Chin-Hui Lee,et al.  Auto-induced semantic classes , 2004, Speech Commun..

[8]  Yonatan Bisk,et al.  Simple Robust Grammar Induction with Combinatory Categorial Grammars , 2012, AAAI.

[9]  Jason Baldridge,et al.  Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models , 2011, ACL.

[10]  Alex Acero,et al.  Rapid development of spoken language understanding grammars , 2006, Speech Commun..

[11]  Euripides G. M. Petrakis,et al.  Fusion of knowledge-based and data-driven approaches to grammar induction , 2014, INTERSPEECH.

[12]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[13]  Bhuvana Ramabhadran,et al.  Data Driven Approach for Language Model Adaptation using Stepwise Relative Entropy Minimization , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[15]  Bart Cramer,et al.  Limitations of Current Grammar Induction Algorithms , 2007, ACL.

[16]  Jonas Kuhn Experiments in parallel-text based grammar induction , 2004, ACL.

[17]  Eric Fosler-Lussier,et al.  UNSUPERVISED COMBINATION OF METRICS FOR SEMANTIC CLASS INDUCTION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[18]  Aarne Ranta,et al.  Grammatical Framework , 2004, Journal of Functional Programming.

[19]  Frédéric Béchet,et al.  Retrieving the syntactic structure of erroneous ASR transcriptions for open-domain Spoken Language Understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Ruhi Sarikaya,et al.  Rapid bootstrapping of statistical spoken dialogue systems , 2008, Speech Commun..

[22]  T. Katerina,et al.  Automatic Term Recognition using Contextual Cues , 1997 .