P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification
暂无分享,去创建一个
Bo Xu | Liang Xu | Jing Xiao | Xiyuan Wang | Fangyuan Wang
[1] Joon Son Chung,et al. VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge , 2023, ArXiv.
[2] Chengming Liu,et al. Global–Local Self-Attention Based Transformer for Speaker Verification , 2022, Applied Sciences.
[3] A. Etemad,et al. Fine-grained Early Frequency Attention for Deep Speaker Recognition , 2022, 2022 International Joint Conference on Neural Networks (IJCNN).
[4] Y. Qian,et al. Local Information Modeling with Self-Attention for Speaker Verification , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Lantian Li,et al. Reliable Visualization for Deep Speaker Recognition , 2022, INTERSPEECH.
[6] Haibin Wu,et al. MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification , 2022, INTERSPEECH.
[7] Rohan Kumar Das,et al. MFA: TDNN with Multi-Scale Frequency-Channel Attention for Text-Independent Speaker Verification with Short Utterances , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Fangyuan Wang,et al. MACCIF-TDNN: Multi Aspect Aggregation of Channel and Context Interdependence Features in TDNN-Based Speaker Verification , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[9] Qingyang Hong,et al. Additive Phoneme-aware Margin Softmax Loss for Language Recognition , 2021, Interspeech.
[10] Ming-Ming Cheng,et al. LayerCAM: Exploring Hierarchical Class Activation Maps for Localization , 2021, IEEE Transactions on Image Processing.
[11] Yaowei Wang,et al. Conformer: Local Features Coupling Global Representations for Visual Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Kris Demuynck,et al. Integrating Frequency Translational Invariance in TDNNs and Frequency Positional Information in 2D ResNets to Enhance Speaker Verification , 2021, Interspeech.
[13] Shinji Watanabe,et al. Recent Developments on Espnet Toolkit Boosted By Conformer , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Haizhou Li,et al. Speaker-Utterance Dual Attention for Speaker and Utterance Verification , 2020, INTERSPEECH.
[15] S Umesh,et al. S-vectors: Speaker Embeddings based on Transformer's Encoder for Text-Independent Speaker Verification , 2020, ArXiv.
[16] Pooyan Safari,et al. Self-attention encoding and pooling for speaker recognition , 2020, INTERSPEECH.
[17] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[18] Kris Demuynck,et al. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification , 2020, INTERSPEECH.
[19] Ian McLoughlin,et al. An Effective Deep Embedding Learning Architecture for Speaker Verification , 2019, INTERSPEECH.
[20] Kai Zhao,et al. Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[21] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[22] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[23] Lukás Burget,et al. Analysis of Score Normalization in Multilingual Speaker Recognition , 2017, INTERSPEECH.
[24] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[25] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Sanjeev Khudanpur,et al. A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.
[29] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[30] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[31] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..