Extracting preference terms from web browsing histories excluding pages unrelated to users' interests

Personalization is one of the most significant challenges in the World Wide Web. Extracting preference terms chiefly from Web browsing histories is a first step in personalization. However, a portion of Web pages includes information unrelated to the users' interests, and personalization results would probably be fuzzy owing to such pages. In this paper, we propose an approach to extract preference terms from Web browsing histories, excluding pages unrelated to the users' interests. Our proposed approach mainly consists of two steps. First is a page classification step, utilizing both URL expressions and keyphrase frequencies in the page, in order to eliminate keyphrases derived from pages unrelated to the users' interests. Second is a keyphrase scoring step, exploiting document frequency of terms, in order to obtain preference terms. Our empirical study for 5 participants over a period of 4 weeks reveals that the proposed approach is more effective for the users with specific Web browsing styles.