Multimodal learning with deep Boltzmann machines
暂无分享,去创建一个
[1] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .
[2] L. Younes. Parametric Inference for imperfectly observed Gibbsian fields , 1989 .
[3] David Haussler,et al. Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.
[4] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.
[5] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .
[6] B. S. Manjunath,et al. Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..
[7] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[8] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[9] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[10] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.
[11] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.
[12] Alan L. Yuille,et al. The Convergence of Contrastive Divergences , 2004, NIPS.
[13] Rong Yan,et al. Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.
[14] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[15] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[16] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[17] Sridha Sridharan,et al. Patch-Based Representation of Visual Speech , 2006 .
[18] Petros Maragos,et al. Multimodal Fusion and Learning with Uncertain Features Applied to Audiovisual Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[19] H. Robbins. A Stochastic Approximation Method , 1951 .
[20] Andrew Zisserman,et al. Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[21] Stephen J. Cox,et al. The challenge of multispeaker lip-reading , 2008, AVSP.
[22] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.
[23] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[24] Mark J. Huiskes,et al. The MIR flickr retrieval evaluation , 2008, MIR '08.
[25] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[26] Jean-Philippe Thiran,et al. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.
[27] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Trans. Speech Audio Process..
[28] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[29] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[30] Geoffrey E. Hinton,et al. Replicated Softmax: an Undirected Topic Model , 2009, NIPS.
[31] Bart Thomee,et al. New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.
[32] Yann LeCun,et al. Convolutional Learning of Spatio-temporal Features , 2010, ECCV.
[33] Cordelia Schmid,et al. Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.
[34] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.
[35] Özgür Ulusoy,et al. Bilvideo-7: an MPEG-7- compatible video indexing and retrieval system , 2010 .
[36] Cordelia Schmid,et al. Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[37] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[38] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[39] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[40] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..