Data preprocessing algorithm for Web Structure Mining

World Wide Web is an extremely large collection of information, i.e. beyond our imagination. It provides enough information according to user's need. Web is rising dreadfully as approximately 70 million pages are added daily. Knowledge Discovery on web data is referred as Web Mining. Web Structure Mining based on the analysis of patterns from hyperlink structure in the web. Like as Data Mining, Web Mining has four stages i.e. Data Collection, Preprocessing, Knowledge Discovery and Knowledge Analysis. This paper based on the first two stages Data collection and Preprocessing. Data collection is to collect the data required for analysis. Data preprocessing is considered as an important stage of Web Structure mining because of data available on web is unstructured, heterogeneous and noisy.

[1]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[2]  Sourav S. Bhowmick,et al.  Research Issues in Web Data Mining , 1999, DaWaK.

[3]  Mike Thelwall Mining the World Wide Web: An Information Search Approach , 2002, J. Documentation.

[4]  Petra Benkovská,et al.  Web Usage Mining , 2009, Encyclopedia of Database Systems.

[5]  Liu Zhijing,et al.  Web mining research , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[6]  Jaideep Srivastava,et al.  Web Mining — Concepts, Applications, and Research Directions , 2004 .

[7]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[8]  Zhiguo Gong,et al.  Web structure mining: an introduction , 2005, 2005 IEEE International Conference on Information Acquisition.

[9]  J. E. Pitkow,et al.  WebViz: A tool for WWW access log analysis , 1994 .

[10]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.