Telling questions from statements in spoken dialogue systems

To date, just about every system that successfully communicates with humans using language owes a great deal of its success to one particular set of characteristics. The ability to control or make assumptions about the linguistic context, the dialogue style and the physical situation in which the dialogue occurs is essential to systems ranging from early text based systems such as Eliza (Weizenbaum, 1966) to modern speech based question-answer systems such as Siri, acquired by Apple in 2010 and launched the companies’ smart phones in 2011. Controlling or predicting context, style and situation allows us to build domain dependent systems – systems designed to handle a very small subset of the dialogues humans in which take part. One of the motivations for spoken dialogue system research is an improved understanding of human interaction. But demonstrating an understanding of human interaction by mimicking it in a machine requires that we do not constrain the domain any more than humans do. A widely held hope is that domain independency can be reached by (1) gradually increasing in-domain coverage, (2) gradually widening the domains, (3) adding new domains, (4) developing robust domain detection, and (5) methods to seamlessly move from one domain to another. The soothing idea is that progress can be gradual, and that each small improvement adds to the improvement of the whole. When subjected to closer scrutiny, the hope of gradually building ourselves out of domain dependency may be overly optimistic. Some distinctions that seem trivial to humans and spoken dialogue systems alike may turn out to be unexpectedly difficult when we relinquish control over context, style and situation. We suspect that the simple distinction between question and statement is an example of this.