Document Clustering Algorithm Based on Modified Latent Semantic Analysis

This paper proposed a new algorithm of document clustering based on modified latent semantic analysis.New method of feature extraction was used to construct word-document matrix.Latent semantic analysis which stems from linear algebra performed a Singular Value Decomposition of word-document matrix,so that not important information was filtered,and the high dimension represent of document in Vector Space Model was changed to low dimension represent in latent semantic space.Co-occurrence data was changed to probabilistic model by modified latent semantic analysis,the performance of clustering was improved.Experimental result shows that the proposed cluster algorithm is effective.