Multi-channel spectrograms for speech processing applications using deep learning methods

[1]  Dimitri Palaz,et al.  Analysis of CNN-based speech recognition system using raw speech as input , 2015, INTERSPEECH.

[2]  DeLiang Wang,et al.  Deep neural network based spectral feature mapping for robust speech recognition , 2015, INTERSPEECH.

[3]  Rajib Rana,et al.  Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends , 2020, ArXiv.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Elmar Nöth,et al.  Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's Disease , 2017, INTERSPEECH.

[7]  Emmanuel Vincent,et al.  Audio Source Separation and Speech Enhancement , 2018 .

[8]  Yu Tsao,et al.  Complex spectrogram enhancement by convolutional neural network with multi-metrics learning , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[9]  Archontis Politis,et al.  Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10]  Elmar Nöth,et al.  Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users , 2019, INTERSPEECH.

[11]  Musaed Alhussein,et al.  Voice Pathology Detection Using Deep Learning on Mobile Healthcare Framework , 2018, IEEE Access.

[12]  Haibo Mi,et al.  Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network , 2018, PCM.

[13]  Elmar Nöth,et al.  Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech , 2019, INTERSPEECH.

[14]  Louis B. Rall,et al.  Automatic differentiation , 1981 .

[15]  Richard M. Stern,et al.  Speech recognition from GSM codec parameters , 1998, ICSLP.

[16]  Elmar Nöth,et al.  Phonological i-Vectors to Detect Parkinson's Disease , 2018, TSD.

[17]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Milos Cernak,et al.  Nasal Speech Sounds Detection Using Connectionist Temporal Classification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Tara N. Sainath,et al.  Deep Learning for Audio Signal Processing , 2019, IEEE Journal of Selected Topics in Signal Processing.

[20]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[21]  Elmar Nöth,et al.  Multi-channel Convolutional Neural Networks for Automatic Detection of Speech Deficits in Cochlear Implant Users , 2019, CIARP.

[22]  Emmanuel Vincent,et al.  Time-Frequency Processing: Spectral Properties , 2018, Audio Source Separation and Speech Enhancement.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[25]  Vipul Arora,et al.  Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning , 2017, INTERSPEECH.

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Elmar Nöth,et al.  Consonant-to-Vowel/Vowel-to-Consonant Transitions to Analyze the Speech of Cochlear Implant Users , 2019, TSD.

[28]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[29]  Andreas Wendemuth,et al.  Recognition of emotional speech with convolutional neural networks by means of spectral estimates , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).

[30]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[31]  Xue Chen,et al.  Probabilistic Shaping QC-LDPC Coded Modulation Scheme for Optical Fiber Systems , 2018, 2018 Conference on Lasers and Electro-Optics Pacific Rim (CLEO-PR).

[32]  Juan R. Orozco-Arroyave,et al.  Analysis of speech of people with Parkinson's disease , 2016 .

[33]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[34]  John J. Soraghan,et al.  A Deep Learning Method for Pathological Voice Detection Using Convolutional Deep Belief Networks , 2018, INTERSPEECH.

[35]  J. R. Orozco-Arroyave,et al.  Speech differences between CI users with pre- and postlingual onset of deafness detected by speech processing methods on voiceless to voice transitions , 2019, Abstract- und Posterband – 90. Jahresversammlung der Deutschen Gesellschaft für HNO-Heilkunde, Kopf- und Hals-Chirurgie e.V., Bonn – Digitalisierung in der HNO-Heilkunde.

[36]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[37]  Sriram Ganapathy,et al.  3-D CNN Models for Far-Field Multi-Channel Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).