Looking at the Last Two Turns, I'd Say This Dialogue Is Doomed - Measuring Dialogue Success

Two sets of linguistic features are developed: The first one to estimate if a single step in a dialogue between a human being and a machine is successful or not. The second set to classify dialogues as a whole. The features are based on Part-of-Speech-Labels (POS), word statistics and properties of turns and dialogues. Experiments were carried out on the SympaFly corpus, data from a real application in the flight booking domain. A single dialogue step could be classified with an accuracy of 83% (class-wise averaged recognition rate). The recognition rate for whole dialogues was 85%.