Intelligent Speech Synthesis as Part of an Integrated Speech Synthesis / Automatic Speech Recognition System

INTRODUCTION We are concerned with speech synthesis as an integral part of an interactive database inquiry system. Several of the chapters in this book deal with such systems (e.g. Morel, Ch. 24; Waterworth, Ch. 25; Proctor & Young, Ch. 29; Ostler, Ch. 31; Carbonell & Pierre, Ch. 32). Our approach differs from most current synthesis methods in that it permits input both from text and from concepts, and contains within it intelligent elements. Database inquiry systems usually have an automatic speech recognition input, some intelligent access to the database, and a speech synthesis output. A typical system including its human user is shown in Figure 22-1. In most systems the top level intelligent section outputs text which is then turned into speech using synthesis. Text, however, fails to encode much of the semantic information which might assist in producing a more natural output and allowing a more convincing interaction with the human user. Particularly in the prosodics many deficiencies could be overcome if what is discarded in basing the synthesis on text were retained. This is especially true in subtle areas of communication, such as mood and attitude (Hunt, Ch. 21).