论文信息 - Clustering remote RDF data using SPARQL update queries

Clustering remote RDF data using SPARQL update queries

The emergence of large and distributed RDF data in the Linked Open Data cloud calls for approaches to extract useful knowledge using machine learning techniques such as clustering. However, the massive size and remote nature of RDF data hinder traditional approaches that gather the datasets onto a centralized location for analysis. In this work, we show how to implement two representative clustering algorithms using update queries against the SPARQL endpoint of the RDF store. We compare the time complexity and the communication complexity of our algorithms with of those that require direct centralized access to the data and hence have to retrieve the entire RDF dataset from the remote location. We conduct experiments on a real social network dataset and report our preliminary findings.

Vasant Honavar | Harris T. Lin | Letao Qi

[1] Huan Liu,et al. Community Detection and Mining in Social Media , 2010, Community Detection and Mining in Social Media.

[2] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[3] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[4] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5] Hillol Kargupta,et al. K-Means Clustering Over a Large, Dynamic Network , 2006, SDM.

[6] Lei Wang,et al. Learning with multi-resolution overlapping communities , 2013, Knowledge and Information Systems.

[7] Carlos Ordonez,et al. Integrating K-means clustering with a relational DBMS using SQL , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8] Inderjit S. Dhillon,et al. A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[9] OrdonezCarlos. Integrating K-Means Clustering with a Relational DBMS Using SQL , 2006 .

[10] Vasant Honavar,et al. Learning Relational Bayesian Classifiers from RDF Data , 2011, SEMWEB.

[11] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.