K-SVMeans: A Hybrid Clustering Algorithm for Multi-Type Interrelated Datasets

Identification of distinct clusters of documents in text collections has traditionally been addressed by making the assumption that the data instances can only be represented by homogeneous and uniform features. Many real-world data, on the other hand, comprise of multiple types of heterogeneous interrelated components, such as web pages and hyperlinks, online scientific publications and authors and publication venues to name a few. In this paper, we present KSVMeans, a clustering algorithm for multi-type interrelated datasets that integrates the well known K-Means clustering with the highly popular Support Vector Machines. The experimental results on authorship analysis of two real world web-based datasets show that K-SVMeans can successfully discover topical clusters of documents and achieve better clustering solutions than homogeneous data clustering.

[1]  Mick Kerrigan,et al.  Web Services Modeling Ontology , 2006, Semantic Web Services, Processes and Applications.

[2]  S. M. Brien,et al.  W: A Logic for Z , 1991, Z User Workshop.

[3]  Tie-Yan Liu,et al.  Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering , 2005, KDD '05.

[4]  Jin Song Dong,et al.  Class Union and Polymorphism , 1993, TOOLS.

[5]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[6]  WoonKiong Tan,et al.  A semantic model of a small typed functional language using Object-Z , 2000, Proceedings Seventh Asia-Pacific Software Engeering Conference. APSEC 2000.

[7]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[8]  Jin Song Dong,et al.  A combined approach to checking web ontologies , 2004, WWW '04.

[9]  Jin Song Dong,et al.  The role of secondary attributes in formal object modelling , 1995, Proceedings of First IEEE International Conference on Engineering of Complex Computer Systems. ICECCS'95.

[10]  Jun Sun,et al.  Algorithmic Design Using Object-Z for Twig XML Queries Evaluation , 2006, WLFM@FM.

[11]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[12]  Jin Song Dong,et al.  The Geometry of Object Containment , 1994 .

[13]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[14]  Jin Song Dong,et al.  Verifying DAML+OIL and beyond in Z/EVES , 2004, Proceedings. 26th International Conference on Software Engineering.

[15]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[16]  Jun Sun,et al.  A Formal Semantic Model of the Semantic Web Service Ontology (WSMO) , 2007, 12th IEEE International Conference on Engineering Complex Computer Systems (ICECCS 2007).

[17]  Roberto Chinnici,et al.  Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language , 2007 .

[18]  C. Lee Giles,et al.  Clustering Scientific Literature Using Sparse Citation Graph Analysis , 2006, PKDD.

[19]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[20]  Jerry R. Hobbs,et al.  DAML-S: Web Service Description for the Semantic Web , 2002, SEMWEB.

[21]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[22]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[23]  David A. Carrington,et al.  Formalizing the UML Class Diagram Using Object-Z , 1999, UML.

[24]  Chris H. Q. Ding,et al.  Web document clustering using hyperlink structures , 2001, Comput. Stat. Data Anal..

[25]  Graeme Smith Extending W for Object-Z , 1995, ZUM.

[26]  Graeme Smith,et al.  A fully abstract semantics of classes for Object-Z , 1995, Formal Aspects of Computing.

[27]  Jin Song Dong,et al.  Checking and Reasoning about Semantic Web through Alloy , 2003, FME.

[28]  Roger Duke,et al.  Formal Object Oriented Specification Using Object-Z , 2000 .

[29]  Anupriya Ankolekar,et al.  Concurrent Execution Semantics of DAML-S with Subtypes , 2002, International Semantic Web Conference.

[30]  Sheila A. McIlraith,et al.  Simulation, verification and automated composition of web services , 2002, WWW.