Abstract Inspired by the well-known method for confidence measure cal-culation via estimation of word posterior probabilities on theword graph, we devised a technique to estimate confidences onall levels of the hierarchically structured output of our one-stagedecoder for interpretation of natural speech (ODINS). By con-structing a nested lattice hierarchy, the generalized counterpartof the word graph, we estimate posterior probabilities for allnodes in the decoded semantic tree, namely for all containedsemantic units and words. The obtained experimental resultsshow that the tree node confidence measure performs signifi-cantly better than the confidence error base line, no matter ifthe evaluation is carried out on tree nodes representing seman-tic concepts, word classes, or words. Furthermore, the paperproposes possible applications of the tree node confidences toimprove the grounding strategy of spoken dialog systems. 1. Introduction Regarding a robust recognition of application-specific informa-tion, a spoken dialogue system can benefit a great deal fromconfidence measures delivered by the underlying speech recog-nition engine. On word level, there are efficient methods forcomputing confidence measures [1]. However, the speech in-terpreting component of the dialogue system usually derives ahierarchically structured semantic representation of the user’sutterance, that comprises more complex units than words, e.g.semantic concepts or word classes. Thus, in addition to wordconfidences, higher-level confidences related to these semanticunits are needed by the dialogue system to safeguard the rec-ognized structured content and to generate feedback in an ade-quate way.Recent publications [2, 3] suggested to incorporate wordconfidences together with various other features extracted dur-ing the speech recognition and interpretation process into a clas-sifier to assign confidences to each recognized semantic unit.The used classifiers (multi-layer perceptrons in [2] and decisiontrees in [3]) need explicit training before their application. Adifferent approach is proposed by [4] which exclusively usesthe primary knowledge sources of speech recognition and inter-pretation for confidence estimation. Here, the common methodfor word posterior probability calculation on the word graph [1]was extended to estimate concept posterior probabilities on aso-called concept graph, which is generated from an intermedi-ateword graph by semanticparsing using stochastic context freegrammars. However, the determined concept posteriors havebeen applied to enhance word confidences and haven’t beenevaluated as semantic confidences.
[1]
M. Thomae,et al.
Tree matching for evaluation of speech interpretation systems
,
2003,
2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[2]
Yves Normandin,et al.
Robust semantic confidence scoring
,
2002,
INTERSPEECH.
[3]
Wayne H. Ward,et al.
Estimating semantic confidence for spoken dialogue systems
,
2002,
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[4]
Steve Young,et al.
Token passing: a simple conceptual model for connected speech recognition systems
,
1989
.
[5]
Hermann Ney,et al.
Confidence measures for large vocabulary continuous speech recognition
,
2001,
IEEE Trans. Speech Audio Process..
[6]
Günther Ruske,et al.
Impact of word graph density on the quality of posterior probability based confidence measures
,
2003,
INTERSPEECH.
[7]
M. Thomae,et al.
A one-stage decoder for interpretation of natural speech
,
2003,
International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.
[8]
Wayne H. Ward,et al.
A concept graph based confidence measure
,
2002,
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.