A Temporal Dependency Based Multi-modal Active Learning Approach for Audiovisual Event Detection

In this work, two novel active learning approaches for the annotation and detection of audiovisual events are proposed. The assumption behind the proposed approaches is that events are susceptible to substantively deviate from the distribution of normal observations and therefore should be lying in regions of low density. Thus, it is believed that an event detection model can be trained more efficiently by focusing on samples that appear to be inconsistent with the majority of the dataset. The first approach is an uni-modal method which consists in using rank aggregation to select informative samples which have previously been ranked using different unsupervised outlier detection techniques in combination with an uncertainty sampling technique. The information used for the sample selection stems from an unique modality (e.g. video channel). Since most active learning approaches focus on one target channel to perform the selection of informative samples and thus do not take advantage of potentially useful and complementary information among correlated modalities, we propose an extension of the previous uni-modal approach to multi-modality. From a target pool of instances belonging to a specific modality, the uni-modal approach is used to select and manually label a set of informative instances. Additionally, a second set of automatically labelled instances of the target pool is generated, based on a transfer of information stemming from an auxiliary modality which is temporally dependent to the target one. Both sets of labelled instances (automatically and manually labelled instances) are used for the semi-supervised training of a classification model to be used in the next active learning iteration. Both methods have been assessed on a set of participants selected from the UUlmMAC dataset and have proven to be effective in substantially reducing the cost of manual annotation required for the training of a facial event detection model. The assessment is done based on two different methods: Support Vector Data Description and expected similarity estimation. Furthermore, given an appropriate sampling approach, the multi-modal approach outperforms its uni-modal counterpart in most of the cases.

[1]  Shaogang Gong,et al.  Finding Rare Classes: Adapting Generative and Discriminative Models in Active Learning , 2011, PAKDD.

[2]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[3]  Matthieu Cord,et al.  Active Learning Methods for Interactive Image Retrieval , 2008, IEEE Transactions on Image Processing.

[4]  Vladimir Vapnik,et al.  Methods of Pattern Recognition , 2000 .

[5]  Alvaro Soto,et al.  Active learning and subspace clustering for anomaly detection , 2011, Intell. Data Anal..

[6]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[7]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[10]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[11]  T.Y. Lin,et al.  Anomaly detection , 1994, Proceedings New Security Paradigms Workshop.

[12]  Detection of Emotional Events utilizing Support Vector Methods in an Active Learning HCI Scenario , 2014, ERM4HCI '14.

[13]  Deepshikha Tiwari,et al.  Dynamic texture recognition using local binary pattern , 2016 .

[14]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[15]  J. Allwood A Framework for Studying Human Multimodal Communication , 2013 .

[16]  Sascha Meudt,et al.  Inferring mental overload based on postural behavior and gestures , 2016, ERM4CT@ICMI.

[17]  Zhi-Hua Zhou,et al.  On multi-view active learning and the combination with semi-supervised learning , 2008, ICML '08.

[18]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[19]  Zeeshan Syed,et al.  Scalable Personalization of Long-Term Physiological Monitoring: Active Learning Methodologies for Epileptic Seizure Onset Detection , 2012, AISTATS.

[20]  Shili Lin,et al.  Rank aggregation methods , 2010 .

[21]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[22]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Shashidhar G. Koolagudi,et al.  Emotion Recognition Using Vocal Tract Information , 2013 .

[25]  Victoria Xia,et al.  Active learning for electrodermal activity classification , 2015, 2015 IEEE Signal Processing in Medicine and Biology Symposium (SPMB).

[26]  Eduardo Coutinho,et al.  Dynamic Active Learning Based on Agreement and Applied to Emotion Recognition in Spoken Interactions , 2015, ICMI.

[27]  N. Ramesh Babu,et al.  Speech recognition using MFCC and DTW , 2014, 2014 International Conference on Advances in Electrical Engineering (ICAEE).

[28]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[29]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[30]  Patrick Thiam,et al.  Active Learning for Speech Event Detection in HCI , 2016, ANNPR.

[31]  Jenna Wiens,et al.  Active Learning Applied to Patient-Adaptive Heartbeat Classification , 2010, NIPS.

[32]  Zhenhua Li,et al.  Computational Intelligence and Intelligent Systems: 4th International Symposium on Intelligence Computation and Applications, ISICA 2009, Huangshi, China, ... in Computer and Information Science) , 2009 .

[33]  Ion Muslea,et al.  Active Learning with Multiple Views , 2009, Encyclopedia of Data Warehousing and Mining.

[34]  Patrick Thiam,et al.  On Annotation and Evaluation of Multi-modal Corpora in Affective Human-Computer Interaction , 2014, MA3HMI@INTERSPEECH.

[35]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[36]  Shashidhar G. Koolagudi,et al.  Speech Emotion Recognition: A Review , 2013 .

[37]  Sascha Meudt,et al.  Going Further in Affective Computing: How Emotion Recognition Can Improve Adaptive User Interaction , 2016, Toward Robotic Socially Believable Behaving Systems.

[38]  Björn W. Schuller,et al.  Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition , 2012, INTERSPEECH.

[39]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[40]  Patrick Thiam,et al.  Ensembles of Support Vector Data Description for Active Learning Based Annotation of Affective Corpora , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[41]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[42]  Sascha Meudt,et al.  Multi-Modal Classifier-Fusion for the Recognition of Emotions , 2013 .

[43]  Sascha Meudt,et al.  On Gestures and Postural Behavior as a Modality in Ensemble Methods , 2016, ANNPR.

[44]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[45]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[46]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[47]  Sascha Meudt,et al.  Atlas - Annotation tool using partially supervised learning and multi-view co-learning in human-computer-interaction scenarios , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[48]  Jingrui He,et al.  Graph-Based Rare Category Detection , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[49]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[50]  Friedhelm Schwenker,et al.  Pattern classification and clustering: A review of partially supervised learning approaches , 2014, Pattern Recognit. Lett..

[51]  Daniel McDuff,et al.  Facial Action Unit Detection Using Active Learning and an Efficient Non-linear Kernel Approximation , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[52]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[53]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[54]  Tao Xiang,et al.  Finding Rare Classes: Active Learning with Generative and Discriminative Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[55]  Themos Stafylakis,et al.  Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus , 2014, Odyssey.

[56]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[57]  Frank Honold,et al.  In-Depth Analysis of Multimodal Interaction: An Explorative Paradigm , 2016, HCI.

[58]  J Wiens,et al.  Patient-adaptive ectopic beat classification using active learning , 2010, 2010 Computing in Cardiology.

[59]  Markus Schneider,et al.  Expected similarity estimation for large-scale batch and streaming anomaly detection , 2016, Machine Learning.

[60]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[61]  Marius Kloft,et al.  Active learning for network intrusion detection , 2009, AISec '09.

[62]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[63]  Susanne Biundo-Stephan,et al.  Companion-Technology: An Overview , 2016, KI - Künstliche Intelligenz.

[64]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[65]  Wei-Cheng Chang A Revisit to Support Vector Data Description ( SVDD ) , 2013 .

[66]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[67]  Ziping Zhao,et al.  Active Learning for Speech Emotion Recognition Using Conditional Random Fields , 2013, 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[68]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[69]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[70]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[71]  J. Russell Emotion, core affect, and psychological construction , 2009 .

[72]  Sascha Meudt,et al.  Revisiting the EmotiW challenge: how wild is it really? , 2015, Journal on Multimodal User Interfaces.