SoundNet: Learning Sound Representations from Unlabeled Video
暂无分享,去创建一个
[1] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[2] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[3] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.
[4] Andrew Zisserman,et al. Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.
[5] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.
[6] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[7] Vittorio Ferrari,et al. Figure-ground segmentation by transferring window masks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[9] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[10] Waldo Nogueira,et al. Recurrence quantification analysis features for environmental sound recognition , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[11] Kristen Grauman,et al. Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.
[13] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[14] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[15] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[16] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[17] Martial Hebert,et al. Dense Optical Flow Prediction from a Static Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[19] Karol J. Piczak. Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).
[20] Yan Song,et al. Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[21] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[22] Mark D. Plumbley,et al. Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.
[23] Trevor Darrell,et al. Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Justin Salamon,et al. Unsupervised feature learning for urban sound classification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Heikki Huttunen,et al. Polyphonic sound event detection using multi label deep neural networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).
[28] Alain Rakotomamonjy,et al. Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.
[29] Andrew Zisserman,et al. Part level transfer regularization for enhancing exemplar SVMs , 2015 .
[30] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[31] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[32] Huy Phan,et al. Comparing time and frequency domain for audio event recognition using deep learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).
[33] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[34] A. Torralba,et al. Learning Aligned Cross-Modal Representations from Weakly Aligned Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Jitendra Malik,et al. Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.
[39] Charless C. Fowlkes,et al. The Open World of Micro-Videos , 2016, ArXiv.
[40] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] VirtanenTuomas,et al. Detection and Classification of Acoustic Scenes and Events , 2018 .