Data Preparation for Web Mining – A survey

An accepted trend is to categorize web mining into three main areas: web content mining, web structure mining and web usage mining. Web content mining involves extracting details/information from the contents of webpages and performing things like knowledge synthesis. Web structure mining involves the usage of graph theory to understand website structure/hierarchy. Web usage mining involves the mining of useful information from things like server logs, to understand what the user does while on the internet. This paper is intended to be a survey paper of recent papers that deal with cleaning and preparing the data that goes into the three types of mining mentioned earlier.