A Novel Two-Phase SOM Clustering Approach to Discover Visitor Interests in a Website

Mining content, structure and usage data in websites can uncover browsing patterns that different groups of Web visitors follow to access the subjects that are truly valuable to them. Many works in the literature focused on proposing new similarity measures to cluster Web logs and detect segments of browsing behaviors. However, this does not reveal which contents the visitors are interested in since a Web page may contain many different topics. In this paper, a novel two-phase clustering approach based on Self Organizing Maps (SOM) is proposed to address this problem. A systematic process to prepare Web content data for clustering is also described.