论文信息 - Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA)

Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA)

This report describes our submission to the ActivityNet Challenge at CVPR 2019. We use a 3D convolutional neural network (CNN) based front-end and an ensemble of temporal convolution and LSTM classifiers to predict whether a visible person is speaking or not. Our results show significant improvements over the baseline on the AVA-ActiveSpeaker dataset.

Joon Son Chung

[1] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.

[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[3] Cordelia Schmid,et al. Supplementary Material: AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[4] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[7] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[8] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.

[9] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Tinne Tuytelaars,et al. Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.

[11] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.

[12] Joon Son Chung,et al. Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..