论文信息 - Large lexica for speech-to-speech translation: from specification to creation

Large lexica for speech-to-speech translation: from specification to creation

This paper presents the corpora collection and lexica creation for the purposes of Automatic Speech Recognition (ASR) and Text-to-speech (TTS) that are needed in speech-to-speech translation (SST). These lexica will be specified, built and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during the years 2002-2005. Large lexica consisting of phonetic, prosodic and morpho-syntactic content will be provided with well-documented specifications for at least 12 languages [1]. This paper provides a short overview of the speech-to-speech translation lexica in general as well as a summary of the LC-STAR project itself. More detailed information about the specification for the corpora collection and word extraction as well as the specification and format of the lexica are presented in later chapters.

[1] Juha Iso-Sipilä,et al. INVESTIGATION AND ANALYSIS ON DESIGNING CHINESE BALANCE CORPUS , 2002 .

[2] Lori Lamel,et al. The Use of Lexica in Automatic Speech Recognition , 2000 .