Clustering Web Logs Using Similarity Upper Approximation with Different Similarity Measures

In this paper we adopted the similarity upper approximation based clustering of web logs using various similarity/distance metrics. The paper shows the viability of our methodology. Web logs capture the information about web sites as well the sequence of the visit. Sequence of visit provides an important insight about the behavior of the user. Rough set, a soft computing technique, deals with vagueness present in data. It captures the indiscernibility at different levels of granularity. The paper has shown the results on msnbc data set with different similarity measures along with explanation of results. Index Terms—Clustering, sequential data, similarity upper approximation.

[1]  Joo-Hwee Lim,et al.  Similarity Learning for Nearest Neighbor Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Yu-Yun Hsiao,et al.  KSPF: using gene sequence patterns and data mining for biological knowledge management , 2005, Expert Syst. Appl..

[3]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[4]  Supriya Kumar De,et al.  Clustering web transactions using rough approximation , 2004, Fuzzy Sets Syst..

[5]  Thomas A. Runkler,et al.  Automatic keyword extraction with relational clustering and Levenshtein distances , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Pawan Lingras,et al.  Rough set clustering for Web mining , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[8]  Pawan Lingras,et al.  Unsupervised Rough Set Classification Using GAs , 2001, Journal of Intelligent Information Systems.

[9]  Sungjune Park,et al.  Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm , 2008, Data Knowl. Eng..

[10]  Anupam Joshi,et al.  Robust Fuzzy Clustering Methods to Support Web Mining , 1998 .

[11]  Pradeep Kumar,et al.  Rough clustering of sequential data , 2007, Data Knowl. Eng..

[12]  Pradeep Kumar,et al.  Intrusion Detection System Using Sequence and Set Preserving Metric , 2005, ISI.

[13]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[14]  Pradeep Kumar,et al.  Clustering using Similarity Upper Approximation , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[15]  Pradeep Kumar,et al.  SeqPAM: A Sequence Clustering Algorithm for Web Personalization , 2007, Int. J. Data Warehous. Min..