暂无分享,去创建一个
Aren Jansen | Marco Tagliasacchi | Dotan Emanuel | Joel Shor | Oran Lang | Omry Tuval | Yinnon A. Haviv | Felix de Chaumont Quitry | Ira Shavitt | Ronnie Maor | Yinnon Haviv | A. Jansen | Joel Shor | F. D. C. Quitry | M. Tagliasacchi | Dotan Emanuel | Ira Shavitt | Ronnie Maor | Oran Lang | Omry Tuval
[1] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.
[3] Philip J. B. Jackson,et al. Speaker-dependent audio-visual emotion recognition , 2009, AVSP.
[4] Stylianos Asteriadis,et al. Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).
[5] Yu-An Chung,et al. Generative Pre-Training for Speech with Autoregressive Predictive Coding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Rajib Rana,et al. Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends , 2020, ArXiv.
[8] Josien P. W. Pluim,et al. Not‐so‐supervised: A survey of semi‐supervised, multi‐instance, and transfer learning in medical image analysis , 2018, Medical Image Anal..
[9] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.
[10] Björn W. Schuller,et al. Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks , 2018 .
[11] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[12] Björn W. Schuller,et al. Combining frame and turn-level information for robust recognition of emotions within speech , 2007, INTERSPEECH.
[13] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[14] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[15] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[17] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[18] Björn W. Schuller,et al. Attention-augmented End-to-end Multi-task Learning for Emotion Prediction from Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[20] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[21] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[22] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.
[23] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Chuang Gan,et al. Deep Audio Priors Emerge From Harmonic Convolutional Networks , 2020, ICLR.
[25] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[26] Chenliang Xu,et al. Preprint-work in progress , 2019 .
[27] Shauna Revay,et al. Multiclass Language Identification using Deep Learning on Spectral Images of Audio Signals , 2019, ArXiv.
[28] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[29] Yang Song,et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] Frank Rudzicz,et al. On the importance of normative data in speech-based assessment , 2017, ArXiv.
[31] Björn Schuller,et al. Latest Advances in Computational Speech Analysis for Mobile Sensing , 2019, Studies in Neuroscience, Psychology and Behavioral Economics.
[32] Carlos Busso,et al. Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes , 2018, INTERSPEECH.
[33] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[34] Ragini Verma,et al. CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset , 2014, IEEE Transactions on Affective Computing.
[35] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[36] Jan Vanek,et al. A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task , 2018, TSD.
[37] Xiaohua Zhai,et al. The Visual Task Adaptation Benchmark , 2019, ArXiv.
[38] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[39] Yoshua Bengio,et al. Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.
[40] Chao Yang,et al. A Survey on Deep Transfer Learning , 2018, ICANN.
[41] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[42] Lior Wolf,et al. Audio Denoising with Deep Network Priors , 2019, ArXiv.
[43] Carla Lopes,et al. Phone Recognition on the TIMIT Database , 2012 .
[44] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Rajib Rana,et al. Variational Autoencoders for Learning Latent Representations of Speech Emotion , 2017, INTERSPEECH.
[46] Eduardo Coutinho,et al. The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language , 2016, INTERSPEECH.