Embedded machine learning systems for robust spoken language parsing

In processing ill-formed spontaneous spoken utterance, many state-of-the-art robust parsers achieve robustness by allowing skipping of words and rule symbols. The parser's ability to skip words and rule symbols, however, results in a much bigger search space and greatly increases the parse ambiguity. Previous approaches resolved these issues through manually labeling the types of rule symbols, or by utilizing heuristic scores or statistical probabilities. However, these approaches have certain drawbacks. This paper proposes to exploit embedded machine learning techniques to help with pruning and disambiguation in robust parsers. An embedded machine learning system is integrated with the heuristic score and the strategy of basing the types of rule symbols upon their correspondence to the domain model. This integration can considerably relieve the reliance of robust parser development on linguistic expert handcrafting. Our experiments show that this integration offers stronger capability in ambiguity resolution, thereby enabling the robust parser to achieve better parsing accuracy.

[1]  Richard M. Schwartz,et al.  Hidden Understanding Models of Natural Language , 1994, ACL.

[2]  Zheng Liu,et al.  Comparative experiments on task classification for spoken language understanding using Naive Bayes classifier , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[3]  Wayne H. Ward Understanding spontaneous speech: the Phoenix system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Steve J. Young,et al.  Semantic processing using the Hidden Vector State model , 2005, Comput. Speech Lang..

[5]  Feng Gao,et al.  Combining Multiple Statistical Classifiers to Improve the Accuracy of Task Classification , 2005, CICLing.

[6]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[7]  Marsal Gavaldà SOUP: A Parser for Real-world Spontaneous Speech , 2000, IWPT.

[8]  Yulan He,et al.  Robustness Issues in a Data-Driven Spoken Language Understanding System , 2004, HLT-NAACL 2004.

[9]  Ye-Yi Wang Robust language understanding in mipad , 2001, INTERSPEECH.

[10]  Claire Cardie,et al.  Embedded machine learning systems for natural language processing: a general framework , 1995, Learning for Natural Language Processing.

[11]  Ye-Yi Wang A robust parser for spoken language understanding , 1999, EUROSPEECH.