Learning Neural Audio Embeddings for Grounding Semantics in Auditory Perception
暂无分享,去创建一个
[1] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[2] A. Clark,et al. Artificial Intelligence: The Very Idea. , 1988 .
[3] Douglas Eck,et al. Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio , 2011, ISMIR.
[4] Ashwin K. Vijayakumar,et al. Sound-Word2Vec: Learning Word Representations Grounded in Sounds , 2017, EMNLP.
[5] Rada Mihalcea,et al. Going Beyond Text: A Hybrid Image-Text Approach for Measuring Word Relatedness , 2011, IJCNLP.
[6] Jean Maillard,et al. Black Holes and White Rabbits: Metaphor Identification with Visual Features , 2016, NAACL.
[7] Jason Weston,et al. Multi-Tasking with Joint Semantic Spaces for Large-Scale Music Annotation and Retrieval , 2011 .
[8] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.
[10] Felix Hill,et al. Learning Abstract Concept Embeddings from Multi-Modal Data: Since You Probably Can’t See What I Mean , 2014, EMNLP.
[11] J. Flanagan. Speech Analysis, Synthesis and Perception , 1971 .
[12] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..
[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[14] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Stevan Harnad. The Symbol Grounding Problem , 1999, ArXiv.
[16] Jonathan Foote,et al. Content-based retrieval of music and audio , 1997, Other Conferences.
[17] Yansong Feng,et al. Visual Information in Semantic Representation , 2010, NAACL.
[18] Benjamin Schrauwen,et al. Audio-based Music Classification with a Pretrained Convolutional Network , 2011, ISMIR.
[19] Sabine Schulte im Walde,et al. A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.
[20] Sander Dieleman,et al. Learning feature hierarchies for musical audio signals , 2015 .
[21] Murat Akbacak,et al. Bag-of-Audio-Words Approach for Multimedia Event Classification , 2012, INTERSPEECH.
[22] Douglas D. O'Shaughnessy,et al. Speech communication : human and machine , 1987 .
[23] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.
[24] Samy Bengio,et al. Large-scale content-based audio retrieval from text queries , 2008, MIR '08.
[25] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.
[26] Thomas A. Schreiber,et al. The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.
[27] Douglas Eck,et al. Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.
[28] Emiel van Miltenburg,et al. Sound-based distributional models , 2015, IWCS.
[29] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[30] Carina Silberer,et al. Grounded Models of Semantic Representation , 2012, EMNLP.
[31] Mark S. Seidenberg,et al. Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.
[32] Emmanuel Dupoux,et al. Learning Words from Images and Speech , 2014 .
[33] Stephen Clark,et al. Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception , 2015, EMNLP.
[34] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[35] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[36] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..
[37] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .
[38] Michael C. Hout,et al. Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.
[39] Xavier Serra,et al. Freesound technical demo , 2013, ACM Multimedia.
[40] Marco Baroni,et al. Grounding Distributional Semantics in the Visual World , 2016, Lang. Linguistics Compass.
[41] Fabien Ringeval,et al. At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech , 2016, INTERSPEECH.
[42] Julia Hirschberg,et al. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.
[43] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[44] D. Sculley,et al. Web-scale k-means clustering , 2010, WWW '10.
[45] Stephen Clark,et al. Vision and Feature Norms: Improving automatic feature norm learning through cross-modal maps , 2016, NAACL.
[46] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[47] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[48] Stephen Clark,et al. Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More , 2014, ACL.
[49] Murat Akbacak,et al. KDDI LABS and SRI International at TRECVID 2010: Content-Based Copy Detection , 2010, TRECVID.
[50] Stephen Clark,et al. Grounding Semantics in Olfactory Perception , 2015, ACL.
[51] Antti J. Eronen,et al. Musical instrument recognition using ICA-based transform of features and discriminatively trained HMMs , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..
[52] Stephen Clark,et al. Visual Bilingual Lexicon Induction with Transferred ConvNet Features , 2015, EMNLP.
[53] Max M. Louwerse,et al. Symbol Interdependency in Symbolic and Embodied Cognition , 2011, Top. Cogn. Sci..
[54] Georgiana Dinu,et al. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.
[55] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[56] Stephen Clark,et al. Vector Space Models of Lexical Meaning , 2015 .
[57] Nikolaus Kriegeskorte,et al. Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .
[58] Léon Bottou,et al. Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics , 2014, EMNLP.
[59] Mirella Lapata,et al. Incremental Models of Natural Language Category Acquisition , 2011, CogSci.
[60] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .
[61] Stephen Clark,et al. Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.
[62] Alessandro Lenci,et al. Distributional semantics in linguistic and cognitive research , 2008 .
[63] Brian Gygi,et al. Similarity and categorization of environmental sounds , 2007, Perception & psychophysics.
[64] Angeliki Lazaridou,et al. Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.
[65] Robert A. Jacobs. Learning Multisensory Representations , 2016 .