论文信息 - Web Page Similarity Searching Based on Web Content

Web Page Similarity Searching Based on Web Content

Application that discussed in this paper is able to perform the process of finding web pages that have similar content to the url of the desired web page. Also developed an automated process for crawling web pages. This crawling process will continue since the process is activated. The search process begins by entering a url and web page url is obtained from the extract to get the key words that represent the web page. The keywords will be processed into a basic form using the Porter Stemmer algorithm. TF-IDF method used to obtain the importance of a keyword. Furthermore Jaccard Coefficient formula used to find similarity between web pages. Applications are limited to Web Page in English. Based on test results concluded that this application has worked well and can be utilized.

Justinus Andjarwirawan | Gregorius Satiabudhi | Rubia Sari Setiadi

[1] Rolly Intan,et al. HARD: SUBJECT-BASED SEARCH ENGINE MENGGUNAKAN TF-IDF DAN JACCARDS COEFFICIENT , 2006 .

[2] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[3] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[4] Soumen Chakrabarti,et al. Mining the web - discovering knowledge from hypertext data , 2002 .

[5] Michael W. Berry,et al. Lecture Notes in Data Mining , 2006 .