论文信息 - Multi-channel spectrograms for speech processing applications using deep learning methods - 字舞流文

Multi-channel spectrograms for speech processing applications using deep learning methods

Elmar Nöth | Juan Camilo Vásquez-Correa | J. C. Vásquez-Correa | J. R. Orozco-Arroyave | Juan Rafael Orozco-Arroyave | Tomas Arias-Vergara | Maria Schuster | Philipp Klumpp | T. Arias-Vergara | M. Schuster | E. Nöth | P. Klumpp | J. Orozco-Arroyave | Elmar Nöth

[1] Dimitri Palaz,et al. Analysis of CNN-based speech recognition system using raw speech as input , 2015, INTERSPEECH.

[2] DeLiang Wang,et al. Deep neural network based spectral feature mapping for robust speech recognition , 2015, INTERSPEECH.

[3] Rajib Rana,et al. Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends , 2020, ArXiv.

[4] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6] Elmar Nöth,et al. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's Disease , 2017, INTERSPEECH.

[7] Emmanuel Vincent,et al. Audio Source Separation and Speech Enhancement , 2018 .

[8] Yu Tsao,et al. Complex spectrogram enhancement by convolutional neural network with multi-metrics learning , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[9] Archontis Politis,et al. Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10] Elmar Nöth,et al. Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users , 2019, INTERSPEECH.

[11] Musaed Alhussein,et al. Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework , 2018, IEEE Access.

[12] Haibo Mi,et al. Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network , 2018, PCM.

[13] Elmar Nöth,et al. Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech , 2019, INTERSPEECH.

[14] Louis B. Rall,et al. Automatic differentiation , 1981 .

[15] Richard M. Stern,et al. Speech recognition from GSM codec parameters , 1998, ICSLP.

[16] Elmar Nöth,et al. Phonological i-Vectors to Detect Parkinson's Disease , 2018, TSD.

[17] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18] Milos Cernak,et al. Nasal Speech Sounds Detection Using Connectionist Temporal Classification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Tara N. Sainath,et al. Deep Learning for Audio Signal Processing , 2019, IEEE Journal of Selected Topics in Signal Processing.

[20] Wolfgang Wahlster,et al. Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[21] Elmar Nöth,et al. Multi-channel Convolutional Neural Networks for Automatic Detection of Speech Deficits in Cochlear Implant Users , 2019, CIARP.

[22] Emmanuel Vincent,et al. Time-Frequency Processing: Spectral Properties , 2018, Audio Source Separation and Speech Enhancement.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[25] Vipul Arora,et al. Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning , 2017, INTERSPEECH.

[26] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[27] Elmar Nöth,et al. Consonant-to-Vowel/Vowel-to-Consonant Transitions to Analyze the Speech of Cochlear Implant Users , 2019, TSD.

[28] R. Patterson,et al. Complex Sounds and Auditory Images , 1992 .

[29] Andreas Wendemuth,et al. Recognition of emotional speech with convolutional neural networks by means of spectral estimates , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).

[30] Malcolm Slaney,et al. An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[31] Xue Chen,et al. Probabilistic Shaping QC-LDPC Coded Modulation Scheme for Optical Fiber Systems , 2018, 2018 Conference on Lasers and Electro-Optics Pacific Rim (CLEO-PR).

[32] Juan R. Orozco-Arroyave,et al. Analysis of speech of people with Parkinson's disease , 2016 .

[33] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[34] John J. Soraghan,et al. A Deep Learning Method for Pathological Voice Detection Using Convolutional Deep Belief Networks , 2018, INTERSPEECH.

[35] J. R. Orozco-Arroyave,et al. Speech differences between CI users with pre- and postlingual onset of deafness detected by speech processing methods on voiceless to voice transitions , 2019, Abstract- und Posterband – 90. Jahresversammlung der Deutschen Gesellschaft für HNO-Heilkunde, Kopf- und Hals-Chirurgie e.V., Bonn – Digitalisierung in der HNO-Heilkunde.

[36] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[37] Sriram Ganapathy,et al. 3-D CNN Models for Far-Field Multi-Channel Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).