Spoken Language Translator: First-Year Report

This document is the first-year report for a project whose long-term goal is the construction of a practically useful system capable of translating continuous spoken language within a restricted domain. The main deliverable resulting from the first year is a prototype, the Spoken Language Translator (SLT), which can translate queries from spoken English to spoken Swedish in the domain of air travel planning. The system was developed by SRI International, the Swedish Institute of Computer Science, and Telia Research AB. Most of it is constructed from previously existing pieces of software, which have been adapted for use in the speech translation task with as few changes as possible. The main components are connected together in a pipelined sequence as follows. The input signal is processed by SRI''s DECIPHER(TM), a speaker-independent continuous speech recognition system. It produces a set of speech hypotheses which is passed to the English-language processor, the SRI Core Language Engine (CLE), a general natural- language processing system. The CLE grammar associates each speech hypothesis with a set of possible logical-form-like representations, typically producing 5 to 50 logical forms per hypothesis. A preference component is then used to give each of them a numerical score reflecting its linguistic plausibility. When the preference component has made its choice, the highest-scoring logical form is passed to the transfer component, which uses a set of simple non-deterministic recursive pattern-matching rules to rewrite it into a set of possible corresponding Swedish representations. The preference component is now invoked again, to select the most plausible transferred logical form. The result is fed to a second copy of the CLE, which uses a Swedish- language grammar and lexicon developed at SICS to convert the form into a Swedish string and an associated syntax tree. Finally, the string and tree are passed to the Telia Prophon speech synthesizer, which utilizes polyphone synthesis to produce the spoken Swedish utterance. The system''s current performance figures, measured on previously unseen test data, are as follows. For sentences of length 12 words and under, 65% of all utterances are such that the top-scoring speech hypothesis is an acceptable one. If the speech hypothesis is correct, then a translation is produced in 80% of the cases; and 90% of all translations produced are acceptable. Nearly all incorrect translations are incorrect due to their containing errors in grammar or naturalness of expression, with errors due to divergence in meaning between the source and target sentences accounting for less than 1% of all translations. Making fairly conservative extrapolations from the current SLT prototype, we believe that simply continuing the basic development strategy could within three to five years produce an enhanced version, which recognized about 90% of the short sentences (12 words or less) in a specific domain, and produced accepta

[1]  Hiyan Alshawi,et al.  Bilingual conversation interpreter : a prototype interactive message translator. Final report , 1991 .

[2]  Haym Hirsh,et al.  Explanation-based Generalization in a Logic-Programming Environment , 1987, IJCAI.

[3]  Barbara Gawronska An MT oriented model of aspect and article semantics , 1993 .

[4]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[5]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[6]  Ido Dagan,et al.  Contextual word similarity and estimation from sparse data , 1995, Comput. Speech Lang..

[7]  Ranjit Chatterjee On Cross-Linguistic Categories and Related Problems , 1982 .

[8]  Hiyan Alshawi,et al.  Translation by Quasi Logical Form Transfer , 1991, ACL.

[9]  Kedar Cabelli Explanation - based Generalization as resolution theorem proving , 1987 .

[10]  Keh-Yih Su,et al.  GPSM: A Generalized Probabilistic Semantic Model for Ambiguity Resolution , 1992, ACL.

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[12]  Hiyan Alshawi,et al.  Monotonic Semantic Interpretation , 1992, ACL.

[13]  Masaru Tomita,et al.  Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems , 1985 .

[14]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[15]  Mitch Weintraub,et al.  Speech Recognition in SRI's Resource Management and ATIS Systems , 1991, HLT.

[16]  Jerry R. Hobbs,et al.  Two Principles of Parse Preference , 1990, COLING.

[17]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.

[18]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[19]  Kenneth Ward Church,et al.  Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[20]  Jan van Eijck,et al.  Logical Forms in the Core Language Engine , 1989, ACL.

[21]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[22]  Michael C. McCord,et al.  Heuristics for Broad-Coverage Natural Language Parsing , 1993, HLT.

[23]  Bertil Lyberg,et al.  Yet another rule compiler for text-to-speech conversion? , 1992, ICSLP.

[24]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[25]  Jun'ichi Tsujii,et al.  Interaction between Structural Changes in Machine Translation , 1992, COLING.

[26]  Philip Resnik,et al.  Structural Ambiguity and Conceptual Relations , 1993, VLC@ACL.

[27]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[28]  Stuart M. Shieber,et al.  Prolog and Natural-Language Analysis , 1987 .