Learning spectro-temporal representations of complex sounds with parameterized neural networks
暂无分享,去创建一个
[1] Nima Mesgarani,et al. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[2] Josh H McDermott,et al. Deep neural network models of sensory systems: windows onto the role of task constraints , 2019, Current Opinion in Neurobiology.
[3] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[4] Marco Cuturi,et al. Computational Optimal Transport: With Applications to Data Science , 2019 .
[5] Zdravko Kacic,et al. A study of harmonic features for the speaker recognition , 1997, Speech Commun..
[6] Jean Carletta,et al. The AMI meeting corpus , 2005 .
[7] S. Furukawa,et al. Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition , 2019, The Journal of Neuroscience.
[8] Christoph E Schreiner,et al. Human Superior Temporal Gyrus Organization of Spectrotemporal Modulation Tuning Derived from Speech Stimuli , 2016, The Journal of Neuroscience.
[9] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.
[10] Pavel Korshunov,et al. Pyannote.Audio: Neural Building Blocks for Speaker Diarization , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Antonio Criminisi,et al. Adaptive Neural Trees , 2018, ICML.
[12] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.
[13] Shihab A. Shamma. Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method , 1996 .
[14] Frédéric E. Theunissen,et al. The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..
[15] Mounya Elhilali,et al. Detection of speech tokens in noise using adaptive spectrotemporal receptive fields , 2015, 2015 49th Annual Conference on Information Sciences and Systems (CISS).
[16] T. Yarkoni,et al. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning , 2017, Perspectives on psychological science : a journal of the Association for Psychological Science.
[17] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.
[18] M. Sahani,et al. Editorial overview: Machine learning, big data, and neuroscience , 2019, Current Opinion in Neurobiology.
[19] Anne Hsu,et al. Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds , 2005, Nature Neuroscience.
[20] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[21] J. Belliveau,et al. Short-term plasticity in auditory cognition , 2007, Trends in Neurosciences.
[22] Diego Elgueda,et al. Laminar profile of task-related plasticity in ferret primary auditory cortex , 2018, Scientific Reports.
[23] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[24] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[25] Mounya Elhilali,et al. A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..
[26] Maneesh Sahani,et al. Models of Neuronal Stimulus-Response Functions: Elaboration, Estimation, and Evaluation , 2017, Front. Syst. Neurosci..
[27] Hervé Bredin,et al. pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems , 2017, INTERSPEECH.
[28] Bernd T. Meyer,et al. Spectro-temporal Gabor features for speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Etienne Thoret,et al. Probing machine-learning classifiers using noise, bubbles, and reverse correlation , 2020, Journal of Neuroscience Methods.
[30] D. Gabor,et al. Theory of communication. Part 1: The analysis of information , 1946 .
[31] Sophie Rosset,et al. A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification , 2020, SLSP.
[32] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[34] Daniel L. K. Yamins,et al. A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.
[35] Powen Ru,et al. Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[38] Wiktor Mlynarski,et al. Learning Midlevel Auditory Codes from Natural Sound Statistics , 2017, Neural Computation.
[39] Frédéric E. Theunissen,et al. The vocal repertoire of the domesticated zebra finch: a data-driven approach to decipher the information-bearing acoustic features of communication signals , 2016, Animal Cognition.
[40] Hynek Hermansky,et al. Deriving Spectro-temporal Properties of Hearing from Speech Data , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Yoshua Bengio,et al. Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[42] Essa Yacoub,et al. Reconstructing the spectrotemporal modulations of real-life sounds from fMRI response patterns , 2017, Proceedings of the National Academy of Sciences.
[43] Maneesh Sahani,et al. Input-Specific Gain Modulation by Local Sensory Context Shapes Cortical and Thalamic Responses to Complex Sounds , 2016, Neuron.
[44] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.
[45] J. Fritz,et al. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex , 2003, Nature Neuroscience.
[46] Yangyang Xia,et al. Learnable Spectro-Temporal Receptive Fields for Robust Voice Type Discrimination , 2020, INTERSPEECH.
[47] Tony Ezzat,et al. Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.
[48] Daniel Fogerty,et al. Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation , 2019, INTERSPEECH.
[49] Nicolas Riche,et al. Urban Sound Classification : striving towards a fair comparison , 2020, ArXiv.
[50] Masakiyo Fujimoto,et al. Exploiting spectro-temporal locality in deep learning based acoustic event detection , 2015, EURASIP J. Audio Speech Music. Process..
[51] Marc M. van Wanrooij,et al. Spectrotemporal Response Properties of Core Auditory Cortex Neurons in Awake Monkey , 2015, PloS one.
[52] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.
[53] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[54] Josh H. McDermott,et al. Deep neural network models reveal interplay of peripheral coding and stimulus statistics in pitch perception , 2020, Nature Communications.
[55] Nelson Morgan,et al. Robust CNN-based speech recognition with Gabor filter kernels , 2014, INTERSPEECH.
[56] B. Kollmeier,et al. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. , 2012, The Journal of the Acoustical Society of America.
[57] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[58] Wiktor Mlynarski,et al. Ecological origins of perceptual grouping principles in the auditory system , 2019, Proceedings of the National Academy of Sciences.
[59] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[60] S A Shamma,et al. Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.
[61] N. C. Singh,et al. Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.
[62] F. Sheldon,et al. Avian vocalizations and phylogenetic signal. , 1997, Proceedings of the National Academy of Sciences of the United States of America.
[63] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .