Multimodal feature extraction and fusion for audio-visual speech recognition
暂无分享,去创建一个
[1] C. R. Rao,et al. The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .
[2] Martin Heckmann,et al. Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..
[3] P. Yip,et al. Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .
[4] Pierre Vandergheynst,et al. Analysis of multimodal sequences using geometric video representations , 2006, Signal Process..
[5] F. Fleuret. Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..
[6] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[7] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.
[8] Daniel P. W. Ellis,et al. Using Broad Phonetic Group Experts for Improved Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[9] Sarel van Vuuren,et al. Relevance of time-frequency features for phonetic and speaker-channel classification , 2000, Speech Commun..
[10] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .
[11] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[12] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[13] Moshe Ben-Bassat,et al. 35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.
[14] Jean-Philippe Thiran,et al. Using entropy as a stream reliability estimate for audio-visual speech recognition , 2008, 2008 16th European Signal Processing Conference.
[15] Javier R. Movellan,et al. Dynamic Features for Visual Speechreading: A Systematic Comparison , 1996, NIPS.
[16] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[17] Jean-Luc Schwartz,et al. Comparing models for audiovisual fusion in a noisy-vowel recognition task , 1999, IEEE Trans. Speech Audio Process..
[18] Harriet J. Nock,et al. Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.
[19] Stephen J. Cox,et al. Combining noise compensation with visual information in speech recognition , 1997, AVSP.
[20] S. Furui,et al. Cepstral analysis technique for automatic speaker verification , 1981 .
[21] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[22] Hervé Glotin,et al. A new SNR-feature mapping for robust multistream speech recognition , 1999 .
[23] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[24] Alex Bateman,et al. An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.
[25] Chalapathy Neti,et al. Stream confidence estimation for audio-visual speech recognition , 2000, INTERSPEECH.
[26] Hynek Hermansky. TRAP-TANDEM: data-driven extraction of temporal features from speech , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[27] Paul Duchnowski,et al. Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[28] Daniel P. W. Ellis,et al. Using mutual information to design class-specific phone recognizers , 2003, INTERSPEECH.
[29] Dana H. Ballard,et al. Computer Vision , 1982 .
[30] Jan Flusser,et al. Image registration methods: a survey , 2003, Image Vis. Comput..
[31] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.
[32] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[33] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[34] Alexandros Potamianos,et al. Unsupervised Stream Weight Estimation using Anti-Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[35] Milan Sonka,et al. Image Processing, Analysis and Machine Vision , 1993, Springer US.
[36] Pierre Jourlin. Word-dependent acoustic-labial weights in HMM-based speech recognition , 1997, AVSP.
[37] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[38] Johanna D. Moore,et al. Proceedings of Interspeech 2008 , 2008 .
[39] S. S. Stevens,et al. The Relation of Pitch to Frequency: A Revised Scale , 1940 .
[40] Hervé Bourlard,et al. An introduction to the hybrid hmm/connectionist approach , 1995 .
[41] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[42] Jeff A. Bilmes,et al. Dynamic classifier combination in hybrid speech recognition systems using utterance-level confidence values , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[43] D.P. Morgan,et al. The application of dynamic programming to connected speech recognition , 1990, IEEE ASSP Magazine.
[44] Joseph Picone,et al. Hybrid SVM/HMM architectures for speech recognition , 2000, INTERSPEECH.
[45] A. Adjoudani,et al. On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .
[46] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..
[47] D. W. Robinson,et al. A re-determination of the equal-loudness relations for pure tones , 1956 .
[48] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.
[50] Richard B. Reilly,et al. Feature analysis for automatic speechreading , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).
[51] Lambert Schomaker,et al. Audio visual and Multimodal Speech Systems , 2003 .
[52] Darryl Stewart,et al. Audio-visual integration for robust speech recognition using maximum weighted stream posteriors , 2007, INTERSPEECH.
[53] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[54] Jean-Philippe Thiran,et al. Low-dimensional motion features for audio-visual speech recognition , 2007, 2007 15th European Signal Processing Conference.
[55] Jean-Philippe Thiran,et al. Relevant Feature Selection for Audio-Visual Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[56] Eric David Petajan,et al. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .
[57] Ephraim Feig,et al. Fast algorithms for the discrete cosine transform , 1992, IEEE Trans. Signal Process..
[58] Fuhui Long,et al. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[59] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .
[60] Pierre Vandergheynst,et al. Learning Multimodal Dictionaries , 2007, IEEE Transactions on Image Processing.
[61] Bernhard Schölkopf,et al. Learning with kernels , 2001 .
[62] Darryl Stewart,et al. A new posterior based audio-visual integration method for robust speech recognition , 2005, INTERSPEECH.
[63] Gerasimos Potamianos,et al. Exploiting lower face symmetry in appearance-based automatic speechreading , 2005, AVSP.
[64] Hynek Hermansky,et al. Temporal patterns (TRAPs) in ASR of noisy speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[65] E. Zwicker,et al. Subdivision of the audible frequency range into critical bands , 1961 .
[66] Didier Le Gall,et al. MPEG: a video compression standard for multimedia applications , 1991, CACM.
[67] Jean-Philippe Thiran,et al. Feature space mutual information in speech-video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.
[68] S. S. Stevens. On the psychophysical law. , 1957, Psychological review.
[69] Dimitrios Tzovaras. Multimodal user interfaces : from signals to interaction , 2008 .
[70] Jean-Philippe Thiran,et al. From error probability to information theoretic (multi-modal) signal processing , 2005, Signal Process..
[71] John Makhoul,et al. Spectral linear prediction: Properties and applications , 1975 .
[72] Roberto Battiti,et al. Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.
[73] Scott Axelrod,et al. Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[74] Gerasimos Potamianos,et al. Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[75] Steve Young,et al. The HTK book , 1995 .
[76] Hervé Bourlard,et al. New entropy based combination rules in HMM/ANN multi-stream ASR , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[77] Tsuhan Chen,et al. Real-time lip-synch face animation driven by human voice , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).
[78] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[79] Jeffrey C. Schlimmer,et al. Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning , 1993, ICML.
[80] Robert P. W. Duin,et al. Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[81] Keiichi Tokuda,et al. Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights , 2000, INTERSPEECH.
[82] Ioannis Pitas,et al. A Support Vector Machine-Based Dynamic Network for Visual Speech Recognition Applications , 2002, EURASIP J. Adv. Signal Process..
[83] Martin Heckmann,et al. A hybrid ANN/HMM audio-visual speech recognition system , 2001, AVSP.
[84] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .
[85] Eric D. Petajan. Automatic lipreading to enhance speech recognition , 1984 .
[86] Chalapathy Neti,et al. Improved ROI and within frame discriminant features for lipreading , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).
[87] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[88] Jean-Philippe Thiran,et al. A multimodal approach to extract optimized audio features for speaker detection , 2005, 2005 13th European Signal Processing Conference.
[89] Sadaoki Furui,et al. A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[90] C. H. Chen,et al. Handbook of Pattern Recognition and Computer Vision , 1993 .
[91] Harriet J. Nock,et al. Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.
[92] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[93] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..
[94] Satoshi Nakamura,et al. Stream weight optimization of speech and lip image sequence for audio-visual speech recognition , 2000, INTERSPEECH.
[95] Javier R. Movellan,et al. Visual Speech Recognition with Stochastic Networks , 1994, NIPS.
[96] Hervé Glotin,et al. Weighting schemes for audio-visual fusion in speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[97] Arun Ross,et al. Multimodal biometrics: An overview , 2004, 2004 12th European Signal Processing Conference.
[98] Alexander J. Smola,et al. Learning with kernels , 1998 .
[99] Andreas G. Andreou,et al. Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..
[100] Gerasimos Potamianos,et al. Mutual information based visual feature selection for lipreading , 2004, INTERSPEECH.
[101] Gerasimos Potamianos,et al. An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).
[102] B. Atal. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.
[103] Pierre Vandergheynst,et al. Learning Multi-Modal Dictionaries , 2006 .
[104] Ron Kohavi,et al. Feature Selection for Knowledge Discovery and Data Mining , 1998 .
[105] Trevor Darrell,et al. Speaker association with signal-level audiovisual fusion , 2004, IEEE Transactions on Multimedia.
[106] Stefan Bilbao,et al. Proceedings of the European Signal Processing Conference , 2005 .
[107] Sabri Gurbuz,et al. Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[108] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[109] R. Bakis. Continuous speech recognition via centisecond acoustic states , 1976 .
[110] Juergen Luettin,et al. Speechreading using Probabilistic Models , 1997, Comput. Vis. Image Underst..
[111] Sadaoki Furui,et al. A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[112] Josef Kittler,et al. Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[113] Chalapathy Neti,et al. Automatic speechreading of impaired speech , 2001, AVSP.
[114] D. Howard,et al. Speech and audio signal processing: processing and perception of speech and music [Book Review] , 2000 .
[115] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .
[116] Jean-Philippe Thiran,et al. Mutual information eigenlips for audio-visual speech recognition , 2006, 2006 14th European Signal Processing Conference.
[117] Jean-Philippe Thiran,et al. Multimodal speaker localization in a probabilistic framework , 2006, 2006 14th European Signal Processing Conference.
[118] Sabri Gurbuz,et al. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..
[119] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[120] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[121] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[122] Jean-Philippe Thiran,et al. Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition , 2008, ICMI '08.
[123] Q. Summerfield,et al. Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[124] Ralph Gross,et al. Robust Biometric Person Identification Using Automatic Classifier Fusion of Speech, Mouth, and Face Experts , 2007, IEEE Transactions on Multimedia.
[125] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[126] Anil K. Jain. Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.
[127] Azriel Rosenfeld,et al. Computer Vision , 1988, Adv. Comput..