Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries

This paper describes a methodology for semiautomatic grammar induction from unannotated corpora of information-seeking queries in a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive to (spoken) natural language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or on the availability of annotated corpora. To strive for reasonable coverage on real data, as well as portability across domains and languages, we adopt a statistical approach. Agglomerative clustering using the symmetrized divergence criterion groups words "spatially". These words have similar left and right contexts and tend to form semantic classes. Agglomerative clustering using mutual information groups words "temporally". These words tend to co-occur sequentially to form phrases or multiword entities. Our approach is amenable to the optional injection of prior knowledge to catalyze grammar induction. The resultant grammar is interpretable by humans and is amenable to hand-editing for refinement. Hence, our approach is semiautomatic in nature. Experiments were conducted using the ATIS (Air Travel Information Service) corpus and the semiautomatically-induced grammar G/sub SA/ is compared to an entirely handcrafted grammar G/sub H/. G/sub H/ took two months to develop and gave concept error rates of 7 percent and 11.3 percent, respectively, in language understanding of two test corpora. GSA took only three days to produce and gave concept errors of 14 percent and 12.2 percent on the corresponding test corpora. These results provide a desirable trade-off between language understanding performance and grammar development effort.

[1]  Wolfgang Minker Stochastically-based natural language understanding across tasks and languages , 1997, EUROSPEECH.

[2]  Victor Zue,et al.  Language modelling for recognition and understanding using layered bigrams , 1992, ICSLP.

[3]  Bob Carpenter,et al.  Natural language call routing: a robust, self-organizing approach , 1998, ICSLP.

[4]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[5]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Chin-Hui Lee,et al.  Stochastic Representation of Conceptual Structure in the ATIS Task , 1991, HLT.

[7]  James R. Glass,et al.  Empirical acquisition of word and phrase classes in the atis domain , 1993, EUROSPEECH.

[8]  Lin-Shan Lee,et al.  A syllable-based Chinese spoken dialogue system for telephone directory services primarily trained with a corpus , 1998, ICSLP.

[9]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[10]  Lori Lamel,et al.  Dialog in the RAILTEL telephone-based system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Jean-Luc Gauvain,et al.  User evaluation of the MASK kiosk , 1998, Speech Commun..

[12]  Dan I. Moldovan,et al.  Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction , 1995, IEEE Trans. Knowl. Data Eng..

[13]  S. Kullback,et al.  Tables Useful in Statistics and Information Theory , 1964 .

[14]  Paul Dalsgaard,et al.  Spoken dialogue systems: a European Perspective , 1996 .

[15]  Wayne H. Ward,et al.  The CMU Air Travel Information Service: Understanding Spontaneous Speech , 1990, HLT.

[16]  Salim Roukos,et al.  Statistical natural language understanding using hidden clumpings , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Alon Lavie,et al.  Glr*: a robust grammar-focused parser for spontaneously spoken language , 1996 .

[18]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[19]  Giuseppe Riccardi,et al.  Grammar Fragment acquisition using syntactic and semantic clustering , 1998, Speech Commun..

[20]  Victor Zue,et al.  From interface to content: translingual access and delivery of on-line information , 1997, EUROSPEECH.

[21]  Andreas Stolcke,et al.  The berkeley restaurant project , 1994, ICSLP.

[22]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[23]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[24]  Michael Johnston,et al.  PROFER: predictive, robust finite-state parsing for spoken language , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[25]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[26]  Sheri Hunnicutt,et al.  An experimental dialog system: WAXHOLM , 1993 .

[27]  Richard M. Schwartz,et al.  Hidden understanding models for statistical sentence understanding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Sheri Hunnicutt,et al.  An experimental dialogue system: waxholm , 1993, EUROSPEECH.

[29]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[30]  Victor Zue,et al.  YINHE: a Mandarin Chinese version of the GALAXY system , 1997, EUROSPEECH.

[31]  Stephanie Seneff,et al.  The use of linguistic hierarchies in speech understanding , 1998, ICSLP.

[32]  Alon Lavie August A Robust Grammar Focused Parser for Spontaneously Spoken Language Thesis Summary , .

[33]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[34]  Victor Zue,et al.  WHEELS: a conversational system in the automobile classifieds domain , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[35]  Chao-Huang Chang,et al.  The design of a multi-domain Mandarin Chinese spoken dialogue system , 1998, ICSLP.

[36]  Alexander H. Waibel,et al.  Growing Semantic Grammars , 1998, COLING-ACL.