Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets

Web usage mining involves application of data mining techniques to discover usage patterns from the web data. Clustering is one of the important functions in web usage mining. The likelihood of bad or incomplete web usage data is higher than the conventional applications. The clusters and associations in web usage mining do not necessarily have crisp boundaries. Researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have adapted the K-means clustering algorithm as well as genetic algorithms based on rough sets to find interval sets of clusters. The genetic algorithms based clustering may not be able to handle large amounts of data. The K-means algorithm does not lend itself well to adaptive clustering. This paper proposes an adaptation of Kohonen self-organizing maps based on the properties of rough sets, to find the interval sets of clusters. Experiments are used to create interval set representations of clusters of web visitors on three educational web sites. The proposed approach has wider applications in other areas of web mining as well as data mining.

[1]  Pawan Lingras,et al.  Rough set clustering for Web mining , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[2]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[3]  R.J. Hathaway,et al.  Switching regression models and fuzzy clustering , 1993, IEEE Trans. Fuzzy Syst..

[4]  Zdzislaw Pawlak,et al.  Rough classification , 1984, Int. J. Hum. Comput. Stud..

[5]  Anupam Joshi,et al.  Robust Fuzzy Clustering Methods to Support Web Mining , 1998 .

[6]  Anita Patil-Deshmukh,et al.  Maps , 1992 .

[7]  Andrzej Skowron,et al.  New Directions in Rough Sets, Data Mining, and Granular-Soft Computing , 1999, Lecture Notes in Computer Science.

[8]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[9]  Andrzej Skowron,et al.  Information Granules in Distributed Environment , 1999, RSFDGrC.

[10]  Pawan Lingras,et al.  Interval Set Clustering of Web Users with Rough K-Means , 2004, Journal of Intelligent Information Systems.

[11]  Pawan Lingras,et al.  Unsupervised Rough Set Classification Using GAs , 2001, Journal of Intelligent Information Systems.

[12]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[13]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[14]  Pawan Lingras,et al.  Statistical, Evolutionary, and Neurocomputing Clustering Techniques: Cluster-Based vs Object-Based Approaches , 2005, Artificial Intelligence Review.

[15]  Hichem Frigui,et al.  Fuzzy and possibilistic shell clustering algorithms and their application to boundary detection and surface approximation. II , 1995, IEEE Trans. Fuzzy Syst..