Knowledge Discovery and Retrieval on World Wide Web Using Web Structure Mining

The World Wide Web is nearing omnipresence. The explosively growing number of Web contents including Digitalized manuals, emails pictures, multimedia, and Web services require a distinct and elaborate structural framework that can provide a navigational surrogate for clients as well as for servers. Due to the increasing amount of data Available online, the World Wide Web has becoming one of the most valuable resources for information retrievals and knowledge discoveries. Web mining technologies are the right solutions for knowledge discovery on the Web. The knowledge extracted from the Web can be used to raise the performances for Web information retrievals, question answering, and Web based data warehousing. In this paper, we provide an introduction of Web mining as well as a review of the Web mining categories. Then we focus on one of these categories: the Web structure mining. Within this category, we introduce link mining and review two popular methods applied in Web structure mining: HITS and Page Rank

[1]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[2]  Prabhakar Raghavan,et al.  Mining the Link Structure of the World Wide Web , 1998 .

[3]  Wei-Ying Ma,et al.  Ranking user's relevance to a topic through link analysis on web logs , 2002, WIDM '02.

[4]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[5]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[6]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[7]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[8]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Xiang Ji,et al.  Domain-independent text segmentation using anisotropic diffusion and dynamic programming , 2003, SIGIR.

[11]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[12]  Ziv Bar-Yossef,et al.  Template detection via data mining and its applications , 2002, WWW.

[13]  Geeta,et al.  Amalgamation of Web Usage Mining and Web Structure Mining , 2009 .

[14]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[15]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[16]  Lise Getoor,et al.  Link mining: a new data mining challenge , 2003, SKDD.

[17]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[18]  Jan-Ming Ho,et al.  Discovering informative content blocks from Web documents , 2002, KDD.

[19]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[20]  Huang Yuan,et al.  Web mining: knowledge discovery on the Web , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[21]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[22]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[23]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[24]  Donald Perlis,et al.  Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition , 2002 .

[25]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.

[26]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .