A Web Document Classification Approach Based on Fuzzy Association Concept

In this paper, a method of automatically identifying topics for Web documents via a classification technique is proposed. Web documents tend to have unpredictable characteristics, i.e. differences in length, quality and authorship. Motivated by these fuzzy characteristics, we adopt the fuzzy association concept to classify the documents into some predefined categories or topics. The experimental results show that our approach yields higher classification accuracy compared to the vector space model.

[1]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[2]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[3]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[4]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.