Decision trees for pernicious pages detection

An application framework to perform web usage analysis using advanced data mining methodology is presented. The investigation proposes decision trees for web user behavior analysis. This includes prediction of user future actions and the typical pages leading to browsing termination. The widely known decision tree package C4.5 was used in this study. In the new area of web log mining decision trees showed reasonable computational performance and accuracy. Experiments showed that it is possible to predict future user actions with reasonable misclassification error as well as to find combinations of sequential pages resulting in browsing termination. In addition to this, decision trees generate human understandable rules which can be used to analyze further for web site improvement.