论文信息 - Evaluating Voice Interaction Pipelines at the Edge

Evaluating Voice Interaction Pipelines at the Edge

With the recent releases of Alexa Voice Services and Google Home, voice-driven interactive computing is quickly become commonplace. Voice interactive applications incorporate multiple components including complex speech recognition and translation algorithms, natural language understanding and generation capabilities, as well as custom compute functions commonly referred to as skills. Voice-driven interactive systems are composed of software pipelines using these components. These pipelines are typically resource intensive and must be executed quickly to maintain dialogue-consistent latencies. Consequently, voice interaction pipelines are usually computed entirely in the cloud. However, for many cases, cloud connectivity may not be practical and require these voice interactive pipelines be executed at the edge. In this paper, we evaluate the impact of pushing voice-driven pipelines to computationally-weak edge devices. Our primary motivation is to enable voice-driven interfaces for first responders during emergencies, such as building fires, when connectivity to the cloud is impractical. We first characterize the end-to-end performance of a complete open source voice interaction pipeline for four different configurations ranging from entirely cloud-based to completely edge-based. We also identify potential optimization opportunities to enable voice-drive interaction pipelines to be fully executed at computationally-weak edge devices at lower response latencies than high-performance cloud services

Matthew E. Tolentino | Smruthi Sridhar

[1] Alan W. Black,et al. Flite: a small fast run-time synthesis engine , 2001, SSW.

[2] Francoise Beaufays,et al. “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[3] Paramvir Bahl,et al. Advancing the state of mobile cloud computing , 2012, MCS '12.

[4] Alexander I. Rudnicky,et al. Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5] Heidi Christensen,et al. homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition , 2013, SLPAT.

[6] Hermann Ney,et al. RASR - The RWTH Aachen University Open Source Speech Recognition Toolkit , 2011 .

[7] Jitendra Padhye,et al. Proceedings of the third ACM workshop on Mobile cloud computing and services , 2012, MobiSys 2012.

[8] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[9] Shih-Hao Hung,et al. CSR: A Cloud-Assisted Speech Recognition Service for Personal Mobile Device , 2011, 2011 International Conference on Parallel Processing.