A Hybrid Method for Patterns Mining and Outliers Detection in the Web Usage Log

This paper presents a novel approach to mining patterns and outliers detection in the Web Usage log. This approach involves kernel methods and fuzzy clustering methods. Web log records are considered as vectors with numeric and nominal attributes. These vectors are mapped by means of a special kernel to a high dimensional feature space, where the possibilistic clustering method is used to calculate the measure of "typicalness" of vectors. If the value of this measure for a particular record is less than specified threshold this record is labeled as an outlier. The records with high "typicalness" are considered as access patterns of user activity. The performance of the approach is demonstrated experimentally.

[1]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[2]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[3]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[4]  Constantin V. Negoita,et al.  On Fuzzy Systems , 1978 .

[5]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.

[6]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[7]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[8]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[9]  S. Abe,et al.  Fuzzy support vector machines for pattern classification , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).