A Method of Improving Feature Vector for Web Pages Reflecting the Contents of Their Out-Linked Pages

TF-IDF schemes are popular for generating the feature vectors of documents. These schemes are proposed for characterizing one document. Therefore, in order to characterize Web pages using tf-idf schemes, the feature vectors of the Web pages should be reflected by the contents of Web pages linked with other pages via hyperlinks. In this paper, we propose three methods of generating feature vectors for linked documents such as Web pages. Moreover, in order to verify the effectiveness of our proposed methods, we compare our methods with current search engines and confirm their retrieval accuracy using recall precision curves.