tucSage: Grammar Rule Induction for Spoken Dialogue Systems via Probabilistic Candidate Selection

We describe the grammar induction system for Spoken Dialogue Systems (SDS) submitted to SemEval’14: Task 2. A statistical model is trained with a rich feature set and used for the selection of candidate rule fragments. Posterior probabilities produced by the fragment selection model are fused with estimates of phraselevel similarity based on lexical and contextual information. Domain and language portability are among the advantages of the proposed system that was experimentally validated for three thematically different domains in two languages.

[1]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[2]  T. Katerina,et al.  Automatic Term Recognition using Contextual Cues , 1997 .

[3]  Matthew Lease,et al.  Parsing and its applications for conversational speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[5]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[6]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[7]  Alexandros Potamianos,et al.  A soft-clustering algorithm for automatic induction of semantic classes , 2007, INTERSPEECH.

[8]  Aarne Ranta,et al.  Grammatical Framework , 2004, Journal of Functional Programming.

[9]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[10]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Chin-Hui Lee,et al.  Auto-induced semantic classes , 2004, Speech Commun..

[13]  Alexandros Potamianos,et al.  Unsupervised Semantic Similarity Computation between Terms Using Web Documents , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[15]  Rebecca Hwa Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[16]  Alexander F. Gelbukh,et al.  Soft Cardinality: A Parameterized Similarity Function for Text Comparison , 2012, *SEMEVAL.

[17]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[18]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[19]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[20]  Alexandros Potamianos,et al.  Web data harvesting for speech understanding grammar induction , 2013, INTERSPEECH.

[21]  Helen M. Meng,et al.  Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries , 2002, IEEE Trans. Knowl. Data Eng..

[22]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[23]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[24]  Alex Acero,et al.  Rapid development of spoken language understanding grammars , 2006, Speech Commun..

[25]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[26]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .