Advanced Data Exploitation in Speech Analysis: An overview
暂无分享,去创建一个
[1] Eduardo Coutinho,et al. Cooperative Learning and its Application to Emotion Recognition from Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[2] Jing Huang,et al. Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[3] Dong Yu,et al. Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global Entropy Reduction Maximization Criterion Computer Speech and Language Article in Press Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global E , 2022 .
[4] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.
[5] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[6] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] Joseph Polifroni,et al. Crowd translator: on building localized speech recognizers through micropayments , 2010, OPSR.
[8] Sarah Jane Delany,et al. Using Crowdsourcing for Labelling Emotional Speech Assets , 2010 .
[9] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.
[10] Alexander H. Waibel,et al. Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.
[11] Eduardo Coutinho,et al. Distributing Recognition in Computational Paralinguistics , 2014, IEEE Transactions on Affective Computing.
[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[13] Geoffrey E. Hinton,et al. Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.
[14] James R. Glass,et al. Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Ji Xi,et al. Practical Speech Emotion Recognition Based on Online Learning: From Acted Data to Elicited Data , 2013 .
[16] Björn W. Schuller,et al. The Computational Paralinguistics Challenge [Social Sciences] , 2012, IEEE Signal Processing Magazine.
[17] Christian Biemann,et al. Using representation learning and out-of-domain data for a paralinguistic speech task , 2015, INTERSPEECH.
[18] Thomas Fang Zheng,et al. Transfer learning for speech and language processing , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[19] Bin Ma,et al. Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[20] Jasha Droppo,et al. Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] C. Moseley,et al. Atlas Of The World’s Languages In Danger , 2015 .
[22] Chris Callison-Burch,et al. Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription , 2010, NAACL.
[23] Carmen Peláez-Moreno,et al. Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[24] David Suendermann,et al. Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment , 2013 .
[25] Hermann Ney,et al. Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages , 2014, INTERSPEECH.
[26] Björn W. Schuller,et al. Deep neural networks for acoustic emotion recognition: Raising the benchmarks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Vladimir Naumovich Vapni. The Nature of Statistical Learning Theory , 1995 .
[28] Xiao Li,et al. Regularized Adaptation of Discriminative Classifiers , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[29] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[30] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[31] Björn W. Schuller,et al. Speech Analysis in the Big Data Era , 2015, TSD.
[32] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[33] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..
[34] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[35] Aren Jansen,et al. The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[36] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.
[37] Jeff A. Bilmes,et al. Submodular subset selection for large-scale speech training data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Björn W. Schuller,et al. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.
[39] Mark J. F. Gales,et al. Support vector machines for noise robust ASR , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[40] Björn W. Schuller,et al. Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition , 2012, INTERSPEECH.
[41] Florian Metze,et al. Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[42] Björn W. Schuller,et al. Introducing shared-hidden-layer autoencoders for transfer learning and their application in acoustic emotion recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..
[44] Robert I. Damper,et al. On Acoustic Emotion Recognition: Compensating for Covariate Shift , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[45] Seetha Hari,et al. Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.
[46] Haizhou Li,et al. Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions , 2016, INTERSPEECH.
[47] Richard M. Schwartz,et al. Discriminative semi-supervised training for keyword search in low resource languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[48] Yuan Liu,et al. Speaker verification with deep features , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[49] Hermann Ney,et al. Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.
[50] Dilek Z. Hakkani-Tür,et al. Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.
[51] Motoaki Kawanabe,et al. Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.
[52] Mark D. Plumbley,et al. Fast Dictionary Learning for Sparse Representations of Speech Signals , 2011, IEEE Journal of Selected Topics in Signal Processing.
[53] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[54] Mark J. F. Gales,et al. Unsupervised training and directed manual transcription for LVCSR , 2010, Speech Commun..
[55] Zixing Zhang,et al. An Agreement and Sparseness-based Learning Instance Selection and its Application to Subjective Speech Phenomena , 2014, LREC 2014.
[56] Aren Jansen,et al. Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[57] Kenneth Ward Church,et al. Deep neural network features and semi-supervised training for low resource speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[58] Yuzong Liu,et al. Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[59] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..
[60] Takafumi Kanamori,et al. A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..
[61] Eduardo Coutinho,et al. Enhanced semi-supervised learning for multimodal emotion recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[62] Xavier Anguera Miró,et al. Speed improvements to Information Retrieval-based dynamic time warping using hierarchical K-Means clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[63] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.
[64] Enrique Marcelo Albornoz,et al. Deep Learning for Emotional Speech Recognition , 2014, MCPR.
[65] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[66] James R. Glass,et al. A Transcription Task for Crowdsourcing with Automatic Quality Control , 2011, INTERSPEECH.
[67] Ke Chen,et al. Exploring hierarchical speech representations with a deep convolutional neural network , 2011 .
[68] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[69] Björn W. Schuller,et al. Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[70] Douglas D. O'Shaughnessy,et al. Speech communications - human and machine, 2nd Edition , 2000 .
[71] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .
[72] Andrew McCallum,et al. Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.
[73] Philip C. Woodland. Speaker adaptation for continuous density HMMs: a review , 2001 .
[74] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[75] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[76] Oscar Saz-Torralba,et al. Data-selective transfer learning for multi-domain speech recognition , 2015, INTERSPEECH.
[77] Zixing Zhang,et al. Semi-Autonomous Data Enrichment and Optimisation for Intelligent Speech Analysis , 2015 .
[78] Dilek Z. Hakkani-Tür,et al. Active and unsupervised learning for automatic speech recognition , 2003, INTERSPEECH.
[79] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[80] Jason D. Williams,et al. Crowd-sourcing for difficult transcription of speech , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[81] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[82] Erik Marchi,et al. Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.
[83] Stanley Peters,et al. Conversational In-Vehicle Dialog Systems: The past, present, and future , 2016, IEEE Signal Processing Magazine.
[84] Dong Yu,et al. Maximizing global entropy reduction for active learning in speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[85] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[86] Björn W. Schuller,et al. The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.
[87] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[88] Maxine Eskénazi,et al. Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data , 2010, 2010 IEEE Spoken Language Technology Workshop.
[89] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[90] Rong Zhang,et al. Data selection for speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).
[91] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[92] A. Tanju Erdem,et al. RANSAC-based training data selection for emotion recognition from spontaneous speech , 2010, AFFINE '10.
[93] Herman J. M. Steeneken,et al. Optimal selection of speech data for automatic speech recognition systems , 2002, INTERSPEECH.
[94] Björn W. Schuller,et al. Synthesized speech for model training in cross-corpus recognition of human emotion , 2012, International Journal of Speech Technology.
[95] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[96] Sotiris B. Kotsiantis,et al. Speaker Identification Using Semi-supervised Learning , 2015, SPECOM.
[97] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[98] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[99] Honglak Lee,et al. Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[100] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[101] Björn W. Schuller,et al. iHEARu-PLAY: Introducing a game for crowdsourced data collection for affective computing , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).
[102] Björn W. Schuller,et al. Unsupervised learning in cross-corpus acoustic emotion recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[103] Koichi Shinoda,et al. Speech modeling based on committee-based active learning , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[104] Burr Settles,et al. Active Learning Literature Survey , 2009 .
[105] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[106] László Tóth,et al. Kernel-based feature extraction with a speech technology application , 2004, IEEE Transactions on Signal Processing.
[107] Simone Scardapane,et al. Fully Decentralized Semi-supervised Learning via Privacy-preserving Matrix Completion , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[108] Aren Jansen,et al. Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[109] Kamal Nigamyknigam,et al. Employing Em in Pool-based Active Learning for Text Classiication , 1998 .
[110] Kenneth Ward Church,et al. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[111] William A. Ainsworth,et al. Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..
[112] Yifan Gong,et al. An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[113] Yun Lei,et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[114] Xiaodong Cui,et al. Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[115] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[116] Jean-Luc Gauvain,et al. Active learning based data selection for limited resource STT and KWS , 2015, INTERSPEECH.
[117] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[118] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.
[119] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[120] Craig A. Knoblock,et al. Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.
[121] Xiao Li,et al. Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[122] Biing-Hwang Juang,et al. Recurrent deep neural networks for robust speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[123] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.
[124] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[125] Lin-Shan Lee,et al. Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder , 2016, INTERSPEECH.
[126] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[127] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[128] Eduardo Coutinho,et al. On rater reliability and agreement based dynamic active learning , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).
[129] Björn W. Schuller,et al. Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition , 2017, IEEE Signal Processing Letters.
[130] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.
[131] DeLiang Wang,et al. Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[132] Jingbo Zhu,et al. Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[133] Björn Schuller,et al. The Computational Paralinguistics Challenge , 2012 .
[134] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[135] Vidhyasaharan Sethu,et al. Analysis of acoustic space variability in speech affected by depression , 2015, Speech Commun..
[136] S.Y. Kung,et al. Compressive Privacy: From Information\/Estimation Theory to Machine Learning [Lecture Notes] , 2017, IEEE Signal Processing Magazine.
[137] Björn W. Schuller,et al. Co-training succeeds in Computational Paralinguistics , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.