Research on Classifying Web Information with Latent Semantic Based on Modified LDA Model

This work analyzes the methods of utilizing LDA (Latent Dirichlet Allocation) model to find implicit semantic dimensionality. With the analysis we propose a modified LDA model, E-LDA, according to the timeliness and topic's variation. In E-LDA, time dimension and topic dimension are incorporated. E-LDA firstly discretizes document sets and then analyzes the latent semantic information in discrete time slices. We count the number of topics with known information in current time slice, and apply the E-LDA model to their corresponding document sets. Experimental results show that the proposed E-LDA can classify topic related documents of Web information according to latent semantic, and furthermore, it can accomplish analyzing the trend of topics with their distribution and popularity at different time.