Distance-invariant automatic engagement level recognition using visual cues

In a camera-based engagement level recognition, a face is an important factor because cues mainly come from a face, which is affected from a distance between a camera and a user. In this paper, we present an automatic engagement level recognition method showing stable performance regardless of a distance between a camera and a user. We show a detailed process about getting a distance-invariant cue and compare its performance with and without the process. We also adopt a temporal pyramid structure to extract temporal statistical feature and present a voting method for an engagement level estimation. We show the results and the analysis using the database acquired in the real environment.