Learning and Representing Verbal Meaning

Latent semantic analysis (LSA) is a theory of how word meaning—and possibly other knowledge—is derived from statistics of experience, and of how passage meaning is represented by combinations of words. Given a large and representative sample of text, LSA combines the way thousands of words are used in thousands of contexts to map a point for each into a common semantic space. LSA goes beyond pair-wise co-occurrence or correlation to find latent dimensions of meaning that best relate every word and passage to every other. After learning from comparable bodies of text, LSA has scored almost as well as humans on vocabulary and subject-matter tests, accurately simulated many aspects of human judgment and behavior based on verbal meaning, and been successfully applied to measure the coherence and conceptual content of text. The surprising success of LSA has implications for the nature of generalization and language.