Exploring Deep Learning Methods for Discovering Features in Speech Signals
暂无分享,去创建一个
[1] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[2] K. Stevens,et al. An Electrical Analog of the Vocal Tract , 1953 .
[3] O. Fujimura,et al. Model for Specification of the Vocal‐Tract Area Function , 1966 .
[4] J. Flanagan,et al. Excitation of vocal-tract synthesizers. , 1969, The Journal of the Acoustical Society of America.
[5] C. H. Coker,et al. Synthetic voices for computers , 1970, IEEE Spectrum.
[6] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[7] J. Flanagan. Speech Analysis, Synthesis and Perception , 1971 .
[8] J. Flanagan,et al. Synthesis of voiced sounds from a two-mass model of the vocal cords , 1972 .
[9] John Makhoul,et al. LPCW: An LPC vocoder with linear predictive spectral warping , 1976, ICASSP.
[10] R. Patterson. Auditory filter shapes derived with noise stimuli. , 1976, The Journal of the Acoustical Society of America.
[11] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[12] Frederick Jelinek,et al. Continuous speech recognition , 1977, SGAR.
[13] A. B. Poritz,et al. Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.
[14] B. Moore,et al. Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.
[15] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] C. D. Geisler,et al. Frequency selectivity of single cochlear-nerve fibers based on the temporal response pattern to two-tone signals. , 1986, The Journal of the Acoustical Society of America.
[17] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[18] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[19] D. O'Shaughnessy,et al. Linear predictive coding , 1988, IEEE Potentials.
[20] Satoshi Nakamura,et al. Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[21] L. Carney,et al. Temporal coding of resonances by low-frequency auditory nerve fibers: single-fiber responses and a population model. , 1988, Journal of neurophysiology.
[22] Stephen Cox,et al. Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[23] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[24] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[25] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[26] Hervé Bourlard,et al. Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[27] Michael Picheny,et al. Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[28] Horacio Franco,et al. s Multiple-State Context-Dependent Phonetic Modeling with MLP , 1992 .
[29] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.
[30] Hervé Bourlard,et al. CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[31] Yoshua Bengio,et al. Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.
[32] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[33] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[34] Joseph Picone,et al. Signal modeling techniques in speech recognition , 1993, Proc. IEEE.
[35] Igor Zlokarnik. Experiments with an articulatory speech recognizer , 1993, EUROSPEECH.
[36] Horacio Franco,et al. Context-dependent connectionist probability estimation in a hybrid hidden Markov model-neural net speech recognition system , 1994, Comput. Speech Lang..
[37] Jun S. Liu,et al. Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .
[38] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .
[39] Man Mohan Sondhi,et al. Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..
[40] Hervé Bourlard,et al. Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..
[41] Anthony J. Robinson,et al. An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.
[42] Hamid Sheikhzadeh,et al. Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization , 1994, IEEE Trans. Speech Audio Process..
[43] Steve Young,et al. The HTK book , 1995 .
[44] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..
[45] Steve Renals,et al. THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .
[46] S. Wegmann,et al. Speaker normalization on conversational telephone speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[47] S. Young,et al. Lattice-based discriminative training for large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[48] Roberto Gemello,et al. Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition , 2000, Inf. Sci..
[49] Steve J. Young,et al. MMIE training of large vocabulary recognition systems , 1997, Speech Communication.
[50] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[51] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[52] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[53] Li Lee,et al. A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..
[54] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[55] Mark J. F. Gales,et al. Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..
[56] Hermann Ney,et al. Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..
[57] Daniel P. W. Ellis,et al. Size matters: an empirical study of neural network training for large vocabulary continuous speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[58] Alan Wrench,et al. Continuous speech recognition using articulatory data , 2000, INTERSPEECH.
[59] James H. Martin,et al. Speech and language processing: an introduction to natural language processing , 2000 .
[60] Geoffrey Zweig,et al. LATTICE-BASED UNSUPERVISED MLLR FOR SPEAKER ADAPTATION , 2000 .
[61] Daniel Povey,et al. Large scale discriminative training for speech recognition , 2000 .
[62] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .
[63] Ho-Young Jung,et al. Speech feature extraction using independent component analysis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[64] Steve J. Young,et al. Statistical Modeling in Continuous Speech Recognition (CSR) , 2001, UAI.
[65] Marco Gori,et al. A survey of hybrid ANN/HMM models for automatic speech recognition , 2001, Neurocomputing.
[66] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[67] Michael S. Lewicki,et al. Efficient coding of natural sounds , 2002, Nature Neuroscience.
[68] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[69] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..
[70] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[71] Zoubin Ghahramani,et al. Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.
[72] James R. Glass. A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..
[73] Hui Ye,et al. Perceptually weighted linear transformations for voice conversion , 2003, INTERSPEECH.
[74] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.
[75] Paul Lamere,et al. Sphinx-4: a flexible open source framework for speech recognition , 2004 .
[76] George Saon,et al. Feature space Gaussianization , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[77] Michael J. Black,et al. Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[78] Philipp Slusallek,et al. Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.
[79] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[80] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[81] Geoffrey E. Hinton,et al. Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.
[82] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[83] Michael S. Lewicki,et al. Efficient auditory coding , 2006, Nature.
[84] Geoffrey E. Hinton,et al. Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[85] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[86] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[87] Geoffrey E. Hinton,et al. Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.
[88] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[89] Florian Metze. Discriminative speaker adaptation using articulatory features , 2007, Speech Commun..
[90] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[91] Khe Chai Sim,et al. Discriminative Product-of-Expert acoustic mapping for cross-lingual phone recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[92] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[93] Brian Kingsbury,et al. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[94] Francoise Beaufays,et al. “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .
[95] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[96] Michael S. Lewicki,et al. Information theory: A signal take on speech , 2010, Nature.
[97] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[98] Dong Yu,et al. Investigation of full-sequence training of deep belief networks for speech recognition , 2010, INTERSPEECH.
[99] Geoffrey E. Hinton,et al. Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[100] Brian Kingsbury,et al. The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.
[101] Guangsen Wang,et al. Sequential Classification Criteria for NNs in Automatic Speech Recognition , 2011, INTERSPEECH.
[102] Larry Gillick,et al. Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[103] Geoffrey E. Hinton,et al. A new way to learn acoustic events , 2011 .
[104] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.
[105] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[106] Geoffrey E. Hinton,et al. Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[107] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.
[108] Luca Maria Gambardella,et al. High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.
[109] Kai Feng,et al. The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..
[110] Phil Hoole,et al. Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus , 2011, INTERSPEECH.
[111] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[112] Frank Rudzicz,et al. The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.
[113] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[114] Moncef Gabbouj,et al. Voice Conversion Using Dynamic Kernel Partial Least Squares Regression , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[115] Larry Gillick,et al. Discriminative training for speech recognition is compensating for statistical dependence in the HMM framework , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[116] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[117] Navdeep Jaitly,et al. Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.
[118] Tara N. Sainath,et al. Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.
[119] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[120] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[121] Tara N. Sainath,et al. Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[122] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[123] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[124] Dong Yu,et al. Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[125] Li Deng,et al. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[126] Geoffrey E. Hinton,et al. Using an autoencoder with deformable templates to discover features for automated speech recognition , 2013, INTERSPEECH.
[127] Dong Yu,et al. Exploring convolutional neural network structures and optimization techniques for speech recognition , 2013, INTERSPEECH.
[128] Georg Heigold,et al. Multiframe deep neural networks for acoustic modeling , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[129] Naoyuki Kanda,et al. Elastic spectral distortion for low resource speech recognition with deep neural networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[130] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[131] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[132] Tijmen Tieleman,et al. Optimizing Neural Networks that Generate Iimages , 2014 .
[133] Geoffrey E. Hinton,et al. Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models , 2014, INTERSPEECH.