Strengths, Limitations, and Extensions of LSA

The strength of Latent Semantic Analysis (LSA) (Deerwester, Dumais, Furnas, Landauer,& Harshman,1990, Landauer & Dumais,1997) has been demonstrated in many applications, many of which are described in this book. This chapter briefly describes how LSA has been effectively integrated in some of the applications developed at the Institute for Intelligent Systems, the University of Memphis. The chapter subsequently identifies some weaknesses of the current use of LSA and proposes a few methods to overcome these weaknesses. One problem addresses statistical properties of an LSA space when it is used as a measure of similarity, while the second problem addresses the limited use of dimensional information in the vector representation. With respect to the statistical aspect of LSA, we propose using the standardized value of cosine matches for similarity measurements between documents. Such standardization is based on both the statistical properties of the LSA space and the properties of the specific application. With respect to the dimensional information in LSA vectors, we propose three different methods of using LSA vectors in computing similarity between documents. The three methods adapt to (1) learner perspective, (2) context, and (3)conversational history. These adaptive methods are assessed by examining the relationship between LSA similarity measure and keyword-match based similarity measures. We argue that LSA can be more powerful if such extensions are appropriately used in applications.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[3]  Arthur C. Graesser,et al.  Using Latent Semantic Analysis to Evaluate the Contributions of Students in AutoTutor , 2000, Interact. Learn. Environ..

[4]  Hongyuan Zha,et al.  Matrices with Low-Rank-Plus-Shift Structure: Partial SVD and Latent Semantic Indexing , 1999, SIAM J. Matrix Anal. Appl..

[5]  W. Kintsch,et al.  Are Good Texts Always Better? Interactions of Text Coherence, Background Knowledge, and Levels of Understanding in Learning From Text , 1996 .

[6]  Hu LSA: First dimension and dimensional weighting , 2003 .

[7]  Arthur C. Graesser,et al.  Modules and Information Retrieval Facilities of the Human Use Regulatory Affairs Advisor (HURAA) , 2004 .

[8]  Arthur C. Graesser,et al.  A Revised Algorithm for Latent Semantic Analysis , 2003, IJCAI.

[9]  Curt Burgess,et al.  From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model , 1998 .

[10]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[11]  Arthur C. Graesser,et al.  Using LSA in AutoTutor: Learning through mixed-initiative dialogue in natural language. , 2007 .

[12]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[13]  Zhiqiang Cai NLS : A Non-Latent Similarity Algorithm , 2004 .

[14]  Michael Wilson MRC Psycholinguistic Database , 2001 .

[15]  Max Coltheart,et al.  The MRC Psycholinguistic Database , 1981 .