From Frequency to Meaning: Vector Space Models of Semantics

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.

[1]  R. Darnell Translation , 1873, The Indian medical gazette.

[2]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[3]  C. K. Ogden,et al.  Basic English : a general introduction with rules and grammar , 1930 .

[4]  W. N. Locke,et al.  Machine Translation of Languages: Fourteen Essays , 1955 .

[5]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[6]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[7]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[8]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[9]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[10]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[11]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[12]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[13]  Peter Ladefoged,et al.  UCLA Working Papers in Phonetics, 23. , 1972 .

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  W. Bruce Croft Clustering large files of documents using the single-link method , 1977, J. Am. Soc. Inf. Sci..

[16]  E. Rosch,et al.  Cognition and Categorization , 1980 .

[17]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[18]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[19]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[20]  Dedre Gentner,et al.  Structure-Mapping: A Theoretical Framework for Analogy , 1983, Cogn. Sci..

[21]  Susan T. Dumais,et al.  Statistical semantics: analysis of the potential performance of keyword information systems , 1984 .

[22]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[23]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[24]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[25]  Lance J. Rips,et al.  Combining Prototypes: A Selective Modification Model , 1988, Cogn. Sci..

[26]  Carolyn J. Crouch,et al.  A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[27]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[28]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[29]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[30]  C. Burgess,et al.  Semantic and associative priming in the cerebral hemispheres: Some words do, some words don't … sometimes, some places , 1990, Brain and Language.

[31]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[32]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[33]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[34]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[35]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[36]  Mohamad H. Hassoun,et al.  Associative neural memories , 1993 .

[37]  Pentti Kanerva,et al.  Sparse distributed memory and related models , 1993 .

[38]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[39]  Hinrich Schütze,et al.  A Vector Model for Syntagmatic and Paradigmatic Relatedness , 1993 .

[40]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[41]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[42]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[43]  Yoshihiko Nitta,et al.  Co-Occurrence Vectors From Corpora vs. Distance Vectors From Dictionaries , 1994, COLING.

[44]  Kenneth Ward Church One term or two? , 1995, SIGIR '95.

[45]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[46]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[47]  Gene H. Golub,et al.  Matrix Computations, Third Edition , 1996 .

[48]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[49]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[50]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[51]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[52]  Gerard Salton,et al.  Document Length Normalization , 1995, Inf. Process. Manag..

[53]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[54]  James H. Martin,et al.  Contextual Spelling Correction Using Latent Semantic Analysis , 1997, ANLP.

[55]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[56]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[57]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[58]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[59]  Gerda Ruge,et al.  Automatic Detection of Thesaurus relations for Information Retrieval Applications , 1997, Foundations of Computer Science: Potential - Theory - Cognition.

[60]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[61]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[62]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[63]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[64]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[65]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[66]  Peter W. Foltz,et al.  Learning from text: Matching readers and texts by latent semantic analysis , 1998 .

[67]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[68]  Peter W. Foltz,et al.  The intelligent essay assessor: Applications to educational technology , 1999 .

[69]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[70]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[71]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[72]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[73]  Bob Carpenter,et al.  Vector-based Natural Language Call Routing , 1999, Comput. Linguistics.

[74]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[75]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[76]  Rie Kubota Ando Latent semantic space: iterative scaling improves precision of inter-document similarity measurement , 2000, SIGIR '00.

[77]  W. Lowe,et al.  Towards a Theory of Semantic Space , 2001 .

[78]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[79]  Barbara Rosario,et al.  Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy , 2001, EMNLP.

[80]  Magnus Sahlgren,et al.  From Words to Understanding , 2001 .

[81]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[82]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[83]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[84]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[85]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[86]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[87]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[88]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[89]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[90]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[91]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[92]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[93]  Patrick Pantel,et al.  Document clustering with committees , 2002, SIGIR '02.

[94]  Barbara Rosario,et al.  The Descent of Hierarchy, and Selection in Relational Semantics , 2002, ACL.

[95]  James R. Curran,et al.  Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[96]  Thomas K. Landauer,et al.  On the computational basis of learning and cognition: Arguments from LSA , 2002 .

[97]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[98]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[99]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[100]  Richard Sproat,et al.  The First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[101]  K. Margaritis,et al.  Analysis of Recommender Systems’ Algorithms , 2003 .

[102]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[103]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[104]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[105]  Mirella Lapata,et al.  Constructing Semantic Space Models from Parsed Corpora , 2003, ACL.

[106]  R. Rapp Word sense discovery based on sense descriptor dissimilarity , 2003, MTSUMMIT.

[107]  Joel D. Martin,et al.  Unsupervised Learning of Morphology for English and Inuktitut , 2003, NAACL.

[108]  Tony Veale The Analogical Thesaurus , 2003, IAAI.

[109]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[110]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[111]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[112]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.

[113]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[114]  Graeme Hirst,et al.  Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[115]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[116]  David J. Weir,et al.  Characterising Measures of Lexical Distributional Similarity , 2004, COLING.

[117]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[118]  David F. Gleich,et al.  SVD based term suggestion and ranking system , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[119]  Sunita Sarawagi,et al.  Efficient set joins on similarity predicates , 2004, SIGMOD '04.

[120]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[121]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[122]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[123]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[124]  Tony Veale,et al.  WordNet Sits the S.A.T. - A Knowledge-Based Approach to Lexical Analogy , 2004, ECAI.

[125]  Patrick Pantel,et al.  Inducing Ontological Co-occurrence Vectors , 2005, ACL.

[126]  Peter D. Turney Measuring Semantic Similarity by Latent Relational Analysis , 2005, IJCAI.

[127]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[128]  W. Bruce Croft,et al.  Statistical language modeling for information retrieval , 2006, Annu. Rev. Inf. Sci. Technol..

[129]  Aleks Jakulin,et al.  Discrete Component Analysis , 2005, SLSFS.

[130]  Patrick Pantel,et al.  Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[131]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[132]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[133]  Michael L. Littman,et al.  Corpus-based Learning of Analogies and Semantic Relations , 2005, Machine Learning.

[134]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[135]  James R. Curran,et al.  Scaling Distributional Similarity to Large Corpora , 2006, ACL.

[136]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[137]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[138]  Deniz Yuret,et al.  Clustering Word Pairs to Answer Analogy Questions , 2006 .

[139]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[140]  Stan Szpakowicz,et al.  Learning Noun-Modifier Semantic Relations with Corpus-based and WordNet-based Features , 2006, AAAI.

[141]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[142]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[143]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[144]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[145]  Patrick Pantel,et al.  Ontologizing Semantic Relations , 2006, ACL.

[146]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[147]  Jeffrey P. Bigham,et al.  Names and Similarities on the Web: Fact Extraction in the Fast Lane , 2006, ACL.

[148]  Hoa Trang Dang,et al.  Overview of the TREC 2006 Question Answering Track 99 , 2006, TREC.

[149]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[150]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[151]  Oren Etzioni,et al.  Relational Web Search , 2006 .

[152]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[153]  Ergun Biçici Clustering Word Pairs to Answer Analogy Questions , 2006 .

[154]  Genevieve Gorrell,et al.  Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing , 2006, EACL.

[155]  Ted Pedersen,et al.  Unsupervised Corpus-Based Methods for WSD , 2007 .

[156]  Preslav Nakov,et al.  SemEval-2007 Task 04: Classification of Semantic Relations between Nominals , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[157]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[158]  Katrin Erk,et al.  A Simple, Similarity-based Model for Selectional Preferences , 2007, ACL.

[159]  Peter D. Turney,et al.  SemEval-2007 Task 04: Classification of Semantic Relations between Nominals , 2007, *SEMEVAL.

[160]  Preslav Nakov,et al.  UCB: System Description for SemEval Task #4 , 2007, SemEval@ACL.

[161]  Stephen Clark,et al.  Combining Symbolic and Distributional Models of Meaning , 2007, AAAI Spring Symposium: Quantum Interaction.

[162]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[163]  Tamara G. Kolda,et al.  Cross-language information retrieval using PARAFAC2 , 2007, KDD '07.

[164]  Peter D. Turney Empirical Evaluation of Four Tensor Decomposition Algorithms , 2007, ArXiv.

[165]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[166]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[167]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[168]  Preslav Nakov,et al.  Solving Relational Similarity Problems Using the Web as a Corpus , 2008, ACL.

[169]  Jimmy J. Lin,et al.  Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.

[170]  Ari Rappoport,et al.  Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions , 2008, ACL.

[171]  Peter D. Turney A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations , 2008, COLING.

[172]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[173]  Benoît Lemaire,et al.  Effects of High-Order Co-occurrences on Word Semantic Similarities , 2006, ArXiv.

[174]  Manu Konchady Building Search Applications: Lucene, Lingpipe, and Gate , 2008 .

[175]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[176]  Roberto Basili,et al.  Automatic induction of FrameNet lexical units , 2008, EMNLP.

[177]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[178]  Dominic Widdows,et al.  Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application , 2008, LREC.

[179]  Fuchun Peng,et al.  Analyzing web text association to disambiguate abbreviation in queries , 2008, SIGIR '08.

[180]  Rosie Jones,et al.  The Linguistic Structure of English Web-Search Queries , 2008, EMNLP.

[181]  Fuchun Peng,et al.  Unsupervised query segmentation using generative language models and wikipedia , 2008, WWW.

[182]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[183]  Peter D. Turney The Latent Relation Mapping Engine: Algorithm and Experiments , 2008, J. Artif. Intell. Res..

[184]  Evgeniy Gabrilovich,et al.  Towards intent-driven bidterm suggestion , 2009, WWW '09.

[185]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[186]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[187]  Patrick Pantel,et al.  Semi-Automatic Entity Set Refinement , 2009, NAACL.

[188]  B. Ross The Psychology of Learning and Motivation: Advances in Research and Theory , 2010 .

[189]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[190]  Mehmet Ali Yatbaz,et al.  The Noisy Channel Model for Unsupervised Word Sense Disambiguation , 2010, Computational Linguistics.

[191]  Tim Van de Cruys,et al.  A non-negative tensor factorization model for selectional preference induction , 2009, Natural Language Engineering.

[192]  PantelPatrick,et al.  From frequency to meaning , 2010 .

[193]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.