论文信息 - A clustering method for web data with multi-type interrelated components

A clustering method for web data with multi-type interrelated components

Traditional clustering algorithms work on "flat" data, making the assumption that the data instances can only be represented by a set of homogeneous and uniform features. Many real world data, however, is heterogeneous in nature, comprising of multiple types of interrelated components. We present a clustering algorithm, K-SVMeans, that integrates the well known K-Means clustering with the highly popular Support Vector Machines(SVM) in order to utilize the richness of data. Our experimental results on authorship analysis of scientific publications show that K-SVMeans achieves better clustering performance than homogeneous data clustering.

C. Lee Giles | Seyda Ertekin | Levent Bolelli | Ding Zhou

[1] Jason Weston,et al. Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[2] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.