Detection of Control Structures in Spoken Utterances

State-of-the-art intelligent assistant systems such as Siri and Cortana do not consider control structures in the user input. They reliably react to ordinary commands. However, their architectures are not designed to cope with queries that require complex control flow structuring. We propose a system to overcome these limitations. Our approach models if-then-else, loop, and concurrency constructs in spoken utterances explicitly. The model bridges the gap between linguistic and programmatic semantics. To demonstrate our concept, we apply a rule-based approach. We have implemented three prototypes that use keyphrases to discover potential control structures depending on the type of control structure. However, the full structures are determined differently. For conditionals we use chunk and part-of-speech tags provided by natural language processing tools; for loops and concurrency we make use of an action extraction approach based on semantic role labeling. Additionally, we use coreference information to determine the extent of the respective structure. The explicit modeling of conditionals, loops, and concurrent sections allows us to evaluate the accuracy of our approaches independently from each other and from other language understanding tasks. We have conducted two user studies in the domain of humanoid robotics. The first focused on conditionals. Our prototype achieves F1 scores from 0.783 (automatic speech recognition) to 0.898 (manual transcripts) on unrestricted utterances. In the second the prototypes for loop and concurrency detection also proved useful. F1 scores range from 0.588 (automatic speech recognition) to 0.814 (manual transcripts) for loops and from 0.622 (automatic speech recognition) to 0.842 (manual transcripts) for concurrent sections respectively.

[1]  John T. Maxwell,et al.  Formal issues in lexical-functional grammar , 1998 .

[2]  Jerome R. Bellegarda,et al.  Spoken Language Understanding for Natural Interaction: The Siri Experience , 2012, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[3]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[4]  Liliane Haegeman,et al.  Conditional Clauses: External and Internal Syntax , 2003 .

[5]  Andrew Begel,et al.  Spoken Language Support for Software Development , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[6]  Renaat Declerck,et al.  Conditionals: A Comprehensive Empirical Analysis , 2001 .

[7]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[8]  Walter F. Tichy,et al.  Poster: ProNat: An Agent-Based System Design for Programming in Spoken Natural Language , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[9]  Walter F. Tichy,et al.  NLCI: a natural language command interpreter , 2017, Automated Software Engineering.

[10]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[11]  Robert Dale,et al.  The return of the chatbots , 2016, Natural Language Engineering.

[12]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.