论文信息 - ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview

ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview

This paper provides an overview of the Joint Contest on Multimedia Challenges Beyond Visual Analysis. We organized an academic competition that focused on four problems that require effective processing of multimodal information in order to be solved. Two tracks were devoted to gesture spotting and recognition from RGB-D video, two fundamental problems for human computer interaction. Another track was devoted to a second round of the first impressions challenge of which the goal was to develop methods to recognize personality traits from short video clips. For this second round we adopted a novel collaborative-competitive (i.e., coopetition) setting. The fourth track was dedicated to the problem of video recommendation for improving user experience. The challenge was open for about 45 days, and received outstanding participation: almost 200 participants registered to the contest, and 20 teams sent predictions in the final stage. The main goals of the challenge were fulfilled: the state of the art was advanced considerably in the four tracks, with novel solutions to the proposed problems (mostly relying on deep learning). However, further research is still required. The data of the four tracks will be available to allow researchers to keep making progress in the four tracks.

[1] W. Marsden. I and J , 2012 .

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.

[6] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.

[7] Sergio Escalera,et al. Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits , 2016, ECCV Workshops.

[8] Shiguang Shan,et al. Funnel-Structured Cascade for Multi-View Face Detection with Alignment-Awareness , 2016, Neurocomputing.

[9] Sergio Escalera,et al. ChaLearn Looking at People Challenge 2014: Dataset and Results , 2014, ECCV Workshops.

[10] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[11] Martha Larson,et al. Right inflight?: a dataset for exploring the automatic prediction of movies suitable for a watching situation , 2016, MMSys.

[12] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13] Sergio Escalera,et al. ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14] Sergio Escalera,et al. ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results , 2016, ECCV Workshops.

[15] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[17] Sergio Escalera,et al. ChaLearn Looking at People 2015 challenges: Action spotting and cultural event recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[19] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[20] Marina Bosch,et al. ImageCLEF, Experimental Evaluation in Visual Information Retrieval , 2010 .

[21] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[22] Jimmy J. Lin,et al. Evaluation-as-a-Service: Overview and Outlook , 2015, ArXiv.

[23] Hayley Hung,et al. Emotional and Social Signals: A Neglected Frontier in Multimedia Computing? , 2015, IEEE Multim..