Detecting degree of involvement during conversations is important for summarization, retrieval, and browsing applications. In this paper, we define the degree of involvement as the interest level that a group of participants show in the course of interactions, and propose the automatic detection scheme of scenes of high-interest based on multi-modal sensing. Our research is motivated by the fact that non-verbal information such as gesture and facial expressions plays an important role during a face-to-face conversation. Audio-visual features from the entire group are obtained by sensors located in a meeting room, and topics are extracted by applying latent Dirichlet allocation (LDA) to the features. Then Support Vector Machine (SVM) is used to infer interest level from the topics. We conducted experiments using recording of conversational scenes (total 2hours 43 minutes) with interest level labels of a five point scale. Interest level 4 or over is assigned as high and interest level 3 or under is assigned as low, with the result that the highest accuracy of our inference model can reach 87.3 %.
[1]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[2]
Elizabeth Shriberg,et al.
Spotting "hot spots" in meetings: human judgments and prosodic cues
,
2003,
INTERSPEECH.
[3]
Samy Bengio,et al.
Detecting group interest-level in meetings
,
2005,
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[4]
Kiyoharu Aizawa,et al.
Latent topic driving model for movie affective scene classification
,
2009,
MM '09.
[5]
Daniel P. W. Ellis,et al.
Pitch-based emphasis detection for characterization of meeting recordings
,
2003,
2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).