From communicative context to speech: Integrating dialogue processing, speech production and natural language generation

Abstract The current article discusses the problem of appropriate intonation selection in Person-Machine dialogues, such as those expected in intelligent information systems when, for example, information retrieval is required. An approach is proposed which integrates the previously mostly separate paradigms of automatic natural language generation and speech synthesis in a Person-Machine dialogue scenario. The article introduces the two independent basis components adopted in the approach — a dialogue model for information retrieval (COR) and a text generation system for German (KOMET-PENMAN) — and develops from these a communicative-context-to-speech system architecture. This system provides for the flexible and context-appropriate selection of intonation patterns. The paper argues that such an approach removes some of the well-known gaps in both text-to-speech and concept-to-speech systems.

[1]  Beat Pfister,et al.  Sprachsynthese ab Text , 1987, GLDV-Jahrestagung.

[2]  Wilhelm Oppenrieder,et al.  Zur Intonation von Modus und Fokus im Deutschen , 1989 .

[3]  Peter Ingwersen,et al.  Information Retrieval Interaction , 1992 .

[4]  Akademie der Wissenschaften der Ddr. Zentralinstitut für Sprachwissenschaft,et al.  Grundzüge einer deutschen Grammatik , 1981, Grundlagen und Anwendung der Phonetik.

[5]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[6]  Christine H. Nakatani,et al.  Discourse structural constraints on accent in narrative , 1994, SSW.

[7]  Georg Fries,et al.  Faust - a directory assistance demonstrator , 1995, EUROSPEECH.

[8]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[9]  Michael Halliday Language As Social Semiotic , 1978 .

[10]  Géza Németh,et al.  Prosody generation for German CTS/TTS systems (from theoretical intonation patterns to practical realisation) , 1997, Speech Commun..

[11]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[12]  M. Halliday NOTES ON TRANSITIVITY AND THEME IN ENGLISH. PART 2 , 1967 .

[13]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[14]  Ray Bareiss,et al.  Automated Index Generation for Constructing Large-Scale Conversational Hypermedia Systems , 1993, AAAI.

[15]  Julia Hirschberg,et al.  The intonational Structuring of Discourse , 1986, ACL.

[16]  Nicholas J. Belkin,et al.  Cases, scripts, and information-seeking strategies: On the design of interactive information retrieval systems , 1995 .

[17]  John A. Bateman Dynamic systemic-functional grammar: a new frontier , 1989 .

[18]  J. P. van Hemert,et al.  Speech Synthesis in the SPICOS-project , 1987, GLDV-Jahrestagung.

[19]  Julia Hirschberg,et al.  Using discourse context to guide pitch accent decisions in synthetic speech , 1990, SSW.

[20]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[21]  Caroline Féry,et al.  German intonational patterns , 1993 .

[22]  Mark Steedman,et al.  Specifying intonation from context for speech synthesis , 1994, Speech Communication.

[23]  Ulrich Thiel,et al.  A Conversational Model of Multimodal Interaction in Information Systems , 1993, AAAI.

[24]  M. Halliday A course in spoken English : intonation , 1970 .

[25]  Elke Teich,et al.  Selective Information Presentation in an Integrated Publication System: An Application of Genre-Driven Text Generation , 1995, Inf. Process. Manag..

[26]  Julia Hirschberg,et al.  Assigning Intonational Features in Synthesized Spoken Directions , 1988, ACL.

[27]  D. Bolinger Accent Is Predictable (If You're a Mind-Reader) , 1972 .

[28]  Christoph Hüser,et al.  Knowledge-Based Information Access for Hypermedia Reference Works: Exploring the Spread of the Bauhaus Movement , 1996 .

[29]  Michael O'Donnell A dynamic model of exchange , 1990 .

[30]  E. Ventola The structure of social interaction : a systemic approach to the semiotics of service encounters , 1989 .

[31]  M. Halliday,et al.  Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective , 1989 .

[32]  Mark T. Maybury,et al.  Intelligent multimedia interfaces , 1994, CHI Conference Companion.

[33]  James H. Martin,et al.  Expressing Rhetorical Relations in Instructional Text: A Case Study of the Purposes Relation , 1995, Comput. Linguistics.

[34]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[35]  Eric Bilange,et al.  A Task Independent Oral Dialogue Model , 1991, EACL.

[36]  D. Crystal,et al.  Intonation and Grammar in British English , 1967 .

[37]  Philip R. Cohen,et al.  Intentions in Communication. , 1992 .

[38]  Norbert Reithinger,et al.  Treatment of incomplete dialogues in a speech-to-speech translation system , 1995 .

[39]  Robin P. Fawcett The Computer Generation of Speech with Discoursally and Semantically Motivated Intonation , 1990, INLG.

[40]  Georg Dorffner,et al.  Integrating Stress and Intonation into a Concept-to-Speech System , 1990, COLING.

[41]  Chris Mellish,et al.  Approximate Generation from Non-Hierarchical Representations , 1996, INLG.

[42]  Maristella Agosti,et al.  Information Retrieval and Hypertext , 1996, Information Retrieval and Hypertext.

[43]  Adelheit Stein,et al.  Modelling the Illocutionary Aspects of Information-Seeking Dialogues , 1992, Inf. Process. Manag..

[44]  Lothar Rostek,et al.  Weaving a Web: the Structure and Creation of an Object Network Representing an Electronic Reference Work , 1993, Electron. Publ..

[45]  Robert T. Kasper,et al.  A Flexible Interface for Linking Applications to Penman’s Sentence Generator , 1989, HLT.

[46]  Kristiina Jokinen,et al.  Reasoning about Coherent and Cooperative System Responses , 1993, EWNLG.

[47]  Martin Montgomery,et al.  Studies in discourse analysis , 1981 .

[48]  Adelheit Stein,et al.  Automatic Generation of a Complex Dialogue History , 1996, Canadian Conference on AI.

[49]  Oliviero Stock,et al.  ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[50]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[51]  C. Matthiessen Lexicogrammatical cartography : English systems , 1995 .

[52]  David R. Traum,et al.  CONVERSATION ACTS IN TASK‐ORIENTED SPOKEN DIALOGUE , 1992, Comput. Intell..

[53]  David J. Young,et al.  New developments in systemic linguistics , 1987 .

[54]  Michael Herweg,et al.  Incremental Grammatical Encoding - An Outline of the Synphonics Formulator , 1993, EWNLG.

[55]  William C. Mann,et al.  The Anatomy of a Systemic Choice , 1982, COLING.

[56]  Julia Hirschberg,et al.  Progress in speech synthesis , 1997 .

[57]  John Pheby Intonation und Grammatik im Deutschen , 1975 .

[58]  William C. Mann,et al.  Natural Language Generation in Artificial Intelligence and Computational Linguistics , 1990 .

[59]  Kees van Deemter,et al.  Context modeling and the generation of spoken discourse , 1997, Speech Commun..

[60]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[61]  Jon Atle Gulla,et al.  Dialogue Strategies for Multimedia Retrieval: Intertwining Abductive Reasoning and Dialogue Planning , 1995, MIRO.

[62]  Barbara Heuft,et al.  Towards a prominence-based synthesis system , 1997, Speech Commun..

[63]  Nicholas J. Belkin,et al.  Interaction in information systems : a review of research from document retrieval to knowledge-based systems , 1985 .

[64]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[65]  Andrew S. Patrick,et al.  Conversational hypertext: information access through natural language dialogues with computers , 1989, CHI '89.

[66]  Vibhu O. Mittal,et al.  Employing Knowledge Resources in a New Text Planner Architecture , 1992, NLG.

[67]  Ulrich Thiel,et al.  Integrating Natural Language Components into Graphical Discourse , 1992, ANLP.