Attention, Sobriety Checkpoint! Can Humans Determine by Means of Voice, if Someone is Drunk... and Can Automatic Classifiers Compete?

This paper analyzes the human performance of recognizing drunk speakers merely by voice and compares the results with the performance of an automatic statistical classifier. The study is carried out within the Interspeech 2011 Speaker State Challenge [1] employing the Alcohol Language Corpus (ALC) [2]. The 79 subjects yielded an average performance of 55.8% unweighted accuracy on a balanced intoxicated/non-intoxicated sample set. The statistical classifier developed in this study reaches a performance of 66.6% unweighted accuracy on the test set. In comparison, the subject with the highest performance yielded 70.0%. Our classifier is based on 4368 acoustic and prosodic features. Incorporating linguistic features along with feature selection using Information Gain Ratio (IGR) ranking added 0.7% absolute improvement with resulting in a 29% smaller feature space size. Copyright © 2011 ISCA.