Latent semantic analysis
暂无分享,去创建一个
We now described an interesting application of SVD to text do cuments. Suppose we represent documents as a bag of words, soXij is the number of times word j occurs in document i, for j = 1 : W andi = 1 : D, where W is the number of words and D is the number of documents. To find a document that contains a g iven word, we can use standard search procedures, but this can get confuse d by ynonomy (different words with the same meaning) andpolysemy (same word with different meanings). An alternative approa ch is to assume that X was generated by some low dimensional latent representation X̂ ∈ IR, whereK is the number of latent dimensions. If we compare documents in the latent space, we should get improved retrie val performance, because words of similar meaning get mapped to similar low dimensional locations. We can compute a low dimensional representation of X by computing the SVD, and then taking the top k singular values/ vectors: 1
[1] Genevieve Gorrell,et al. Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing , 2006, EACL.
[2] Susan T. Dumais,et al. Using latent semantic analysis to improve information retrieval , 1988, CHI 1988.
[3] P. W. Foltz,et al. Using latent semantic indexing for information filtering , 1990, COCS '90.