Using Wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments

In this paper we show that a corpus of a few thousand Wikipedia articles about concrete or visualizable concepts can be used to produce a low-dimensional semantic feature representation of those concepts. The purpose of such a representation is to serve as a model of the mental context of a subject during functional magnetic resonance imaging (fMRI) experiments. A recent study [19] showed that it was possible to predict fMRI data acquired while subjects thought about a concrete concept, given a representation of those concepts in terms of semantic features obtained with human supervision. We use topic models on our corpus to learn semantic features from text in an unsupervised manner, and show that those features can outperform those in [19] in demanding 12-way and 60-way classification tasks. We also show that these features can be used to uncover similarity relations in brain activation for different concepts which parallel those relations in behavioral data from human subjects.

[1]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[2]  A. Paivio,et al.  Concreteness, imagery, and meaningfulness values for 925 nouns. , 1968, Journal of experimental psychology.

[3]  L. Barsalou Grounded cognition. , 2008, Annual review of psychology.

[4]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[5]  Ryan J. Prenger,et al.  Bayesian Reconstruction of Natural Images from Human Brain Activity , 2009, Neuron.

[6]  Anna Korhonen,et al.  Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora , 2010, HLT-NAACL 2010.

[7]  G. Rees,et al.  Neuroimaging: Decoding mental states from brain activity in humans , 2006, Nature Reviews Neuroscience.

[8]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[9]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Jean-Baptiste Poline,et al.  Inverse retinotopy: Inferring the visual content of images from brain activation patterns , 2006, NeuroImage.

[12]  Roberto Navigli,et al.  Additional Key Words and Phrases: Word sense disambiguation, word sense discrimination, WSD, lexical semantics, lexical ambiguity, sense annotation, semantic annotation , 2009 .

[13]  Tom M. Mitchell,et al.  Learning to Decode Cognitive States from Brain Images , 2004, Machine Learning.

[14]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .

[15]  Massimo Poesio,et al.  EEG responds to conceptual stimuli and corpus semantics , 2009, EMNLP.

[16]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[17]  Ken McRae,et al.  Category - Specific semantic deficits , 2008 .

[18]  S. Dennis,et al.  What is free association and what does it measure? , 2000, Memory & cognition.

[19]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[20]  Richard M. Shiffrin,et al.  Word Association Spaces for Predicting Semantic Similarity Effects in Episodic Memory. , 2005 .

[21]  Francisco Pereira,et al.  Learning semantic features for fMRI data from definitional text , 2010, HLT-NAACL 2010.

[22]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[23]  W. Montague,et al.  Category norms of verbal items in 56 categories A replication and extension of the Connecticut category norms , 1969 .

[24]  Anna Korhonen,et al.  Acquiring Human-like Feature-Based Conceptual Representations from Corpora , 2010, HLT-NAACL 2010.

[25]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[26]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[27]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[28]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[29]  Katherine A. Rawson,et al.  Category Norms: An Updated and Expanded Version of the Battig and Montague (1969) Norms. , 2004 .

[30]  Deb Roy,et al.  Connecting language to the world , 2005, Artif. Intell..

[31]  Tom M. Mitchell,et al.  Quantitative modeling of the neural representation of objects: How semantic feature norms can account for fMRI activation , 2011, NeuroImage.

[32]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[33]  Masa-aki Sato,et al.  Visual Image Reconstruction from Human Brain Activity using a Combination of Multiscale Local Image Decoders , 2008, Neuron.

[34]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[35]  Allan Paivio,et al.  Extensions of the Paivio, Yuille, and Madigan (1968) norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[36]  Sean M. Polyn,et al.  Beyond mind-reading: multi-voxel pattern analysis of fMRI data , 2006, Trends in Cognitive Sciences.