The AT&t speech API: a study on practical challenges for customized speech to text service

AT&T has recently opened its extensive portfolio of state-ofthe-art Speech Technology to external end-developers as a platform called “The AT&T Speech API”. This study discusses a series of practical challenges found in an industrial deployment of speech to text services, particularly, we examine different strategies for customizing the speech to text process by considering intrinsic factors, inherent to the audio signal, or extrinsic factors, available from other sources, in an industry-grade implementation.

[1]  Dilek Z. Hakkani-Tür,et al.  The AT&T WATSON speech recognizer , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[3]  Giuseppe Di Fabbrizio,et al.  A speech mashup framework for multimodal mobile services , 2009, ICMI-MLMI '09.

[4]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .