Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors
暂无分享,去创建一个
Lin-Shan Lee | Hung-yi Lee | Yuan-ming Liou | Hung-tsung Lu | Hung-yi Lee | Lin-Shan Lee | Hung-tsung Lu | Yuan-ming Liou
[1] Lin-Shan Lee,et al. Enhancing sparse voice annotation for semantic retrieval of personal photos by continuous space word representations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Alex Acero,et al. Soft indexing of speech content for search in spoken documents , 2007, Comput. Speech Lang..
[3] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[4] Lin-Shan Lee,et al. Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis (PLSA) , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[5] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.
[6] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[7] Lin-Shan Lee,et al. Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).
[8] Shih-Fu Chang,et al. VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.
[9] Florian Metze,et al. Two-layer mutually reinforced random walk for improved multi-party meeting summarization , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[10] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.
[11] Yi-Hsuan Yang,et al. ContextSeer: context search and recommendation at query time for shared consumer photos , 2008, ACM Multimedia.
[12] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[13] Wenjie Li,et al. Mutually Reinforced Manifold-Ranking Based Relevance Propagation Model for Query-Focused Multi-Document Summarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Pierre Tirilly,et al. Language modeling for bag-of-visual words image categorization , 2008, CIVR '08.
[15] Shih-Fu Chang,et al. Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Lin-Shan Lee,et al. Latent semantic retrieval of personal photos with sparse user annotation by fused image/speech/text features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[19] Tele Tan,et al. An Improved Method for Image Retrieval Using Speech Annotation , 2003, MMM.
[20] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[21] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[22] Chong-Wah Ngo,et al. Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.
[23] Lin-Shan Lee,et al. Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features , 2014, INTERSPEECH.
[24] Dragutin Petkovic,et al. Query by Image and Video Content: The QBIC System , 1995, Computer.
[25] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[26] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[27] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[28] Thomas Hofmann,et al. Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.
[29] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.
[30] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.
[31] Timothy J. Hazen,et al. Speech-based annotation and retrieval of digital photographs , 2007, INTERSPEECH.