Extraction of topic transition from document stream based on hierarchical clustering

Abstract We propose a method for extracting keywords expressing topic transition from document stream such as news articles based on hierarchical clustering and C-value method for constructing compound words. Through the user evaluation of 640 topics extracted between 32 days, we found users could understand 94.3% topics as news, and 68.6% topics including topic transition.