ACM HotMobile 2013 demo: NLify: mobile spoken natural language interfaces for everyone

Speech has become an attractive means for interacting with the phone. When speech-enabled interactions are few, keyword-based interfaces [1] that require users to remember precise invocations are adequate. As the number of such interactions increases, users are more likely to forget keywords, and spoken natural language (SNL) interfaces that allow users to express their functional intent without conforming to a rigid syntax become desirable. Prominent “first-party” systems such as Siri and Google Voice Search offer such functionality on select domains today. In this demo, we present a system, NLify, which enables any (“third-party”) developer to add an SNL interface to their application. The key challenge behind the system is that there exists much variability even for a simple command. Worse, noise in speech recognition introduces additional variability. To address this challenge, we use webscale crowdsourcing and automated statistical machine paraphrasing to aid developers to cover much of the possible input space. In addition, we use a statistical language model [2] instead of deterministic one to further handle variability as it provides more tolerance against missing or reordered words. Figure 2 illustrates the overall architecture of NLify. NLify is fully integrated into the Windows Phone 8 development process in the form of a Visual Studio extension whose snapshot is presented in Figure 1. And a quantitative evaluation shows that NLify achieves overall recognition rates of 85% across intents.