Multimodal Deep Learning
暂无分享,去创建一个
Juhan Nam | Honglak Lee | Mingyu Kim | Jiquan Ngiam | Andrew Y. Ng | Aditya Khosla | A. Ng | A. Khosla | Honglak Lee | Jiquan Ngiam | Mingyu Kim | Juhan Nam
[1] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[2] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.
[3] Q. Summerfield,et al. Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[4] Yochai Konig,et al. "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[5] Alexander H. Waibel,et al. See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .
[6] Paul Duchnowski,et al. Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[7] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.
[8] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[9] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[10] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[11] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[12] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[13] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[14] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[15] Petros Maragos,et al. Adaptive multimodal fusion by uncertainty compensation , 2006, INTERSPEECH.
[16] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[17] Sridha Sridharan,et al. Patch-Based Representation of Visual Speech , 2006 .
[18] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.
[19] Petros Maragos,et al. Multimodal Fusion and Learning with Uncertain Features Applied to Audiovisual Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[20] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.
[21] Stephen J. Cox,et al. The challenge of multispeaker lip-reading , 2008, AVSP.
[22] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[23] Jean-Philippe Thiran,et al. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.
[24] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[25] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..
[26] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.