Abstract It has been observed that humans can translate nearly four times as quickly with little loss in accuracy simply by dictating, as opposed to typing, their translations. In this paper, we consider the integration of speech recognition into a translator's workstation. In particular, we show how to combine statistical models of speech, language and translation into a single system that decodes a sequence of words in a target language from a sequence of words in a source language together with an utterance of the target language sequence. Results are provided which demonstrate that the difficulty of the speech recognition task can be reduced by making use of information contained in the source text being translated.