暂无分享,去创建一个
[1] Cordelia Schmid,et al. Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[3] Aren Jansen,et al. A segmental framework for fully-unsupervised large-vocabulary speech recognition , 2016, Comput. Speech Lang..
[4] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[5] Deb Roy,et al. Grounded spoken language acquisition: experiments in word learning , 2003, IEEE Trans. Multim..
[6] Kenneth Ward Church,et al. Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.
[7] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[8] Aren Jansen,et al. A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge , 2015, INTERSPEECH.
[9] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[11] Pietro Perona,et al. Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).
[12] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[13] Emmanuel Dupoux,et al. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.
[14] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[15] James Glass,et al. Analysis of Audio-Visual Features for Unsupervised Speech Recognition , 2017 .
[16] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[18] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Aren Jansen,et al. Unsupervised neural network based feature extraction using weak top-down constraints , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.
[21] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[23] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[25] James R. Glass,et al. A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.
[26] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[27] Grzegorz Chrupala,et al. Encoding of phonology in a recurrent neural model of grounded speech , 2017, CoNLL.
[28] Lukás Burget,et al. Variational Inference for Acoustic Unit Discovery , 2016, Workshop on Spoken Language Technologies for Under-resourced Languages.
[29] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[30] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Grzegorz Chrupala,et al. From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning , 2016, COLING.
[32] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[33] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[34] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[36] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..
[37] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[38] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.
[39] Hung-An Chang,et al. Resource configurable spoken query detection using Deep Boltzmann Machines , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[41] Aren Jansen,et al. Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[42] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..
[43] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[45] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Aren Jansen,et al. Efficient spoken term discovery using randomized algorithms , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[47] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[48] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[49] Dragomir Anguelov,et al. Self-taught object localization with deep networks , 2014, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).
[50] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[51] Alexei A. Efros,et al. Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[52] Ewan Dunbar,et al. A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling , 2015, INTERSPEECH.
[53] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] James R. Glass,et al. Learning Word-Like Units from Joint Audio-Visual Analysis , 2017, ACL.
[55] Olivier Gibaru,et al. CNN features are also great at unsupervised classification , 2017, ArXiv.
[56] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[57] Gregory Shakhnarovich,et al. Visually Grounded Learning of Keyword Prediction from Untranscribed Speech , 2017, INTERSPEECH.
[58] Elizabeth S. Spelke,et al. Principles of Object Perception , 1990, Cogn. Sci..
[59] Cordelia Schmid,et al. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).