Emerging Pragmatic Patterns in Large-Scale

With the development of the Linked Data, an increasing number of RDF data sets are published in many application domains. To understand the underlying meaning and characteristics of large RDF data, and to reuse popular domain terms when publishing data, captur- ing emerging pragmatic patterns is critical. In this paper, we propose the notion of term co-instantiation graph (TIG) and a method to build a TIG for a given RDF dataset. We also describe a clustering-based approach to distill a set of pragmatic patterns from a TIG, which reveal the prag- matic custom of highly-correlated terms. Through extensive experiments on a real big dataset containing 21 M RDF documents, we analyze the macroscopic structure of the term co-instantiation graph and pragmatic patterns from the complex network point of view, and demonstrate our approach can not only give an elaborated ontology partitioning from the pragmatic perspective to ease the ontology reuse, but also provide a new way to explore the Linked Data.

[1]  Agnieszka Lawrynowicz,et al.  Faster Frequent Pattern Mining from the Semantic Web , 2006, Intelligent Information Systems.

[2]  Isabelle Augenstein,et al.  Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets , 2013, International Semantic Web Conference.

[3]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[4]  Hsinchun Chen,et al.  A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system , 1997 .

[5]  Li Ding,et al.  Analyzing Social Networks on the Semantic Web , 2005 .

[6]  Nicola Fanizzi,et al.  Metric-based stochastic conceptual clustering for ontologies , 2009, Inf. Syst..

[7]  Rafael Berlanga Llavori,et al.  Finding association rules in semantic web data , 2012, Knowl. Based Syst..

[8]  Aldo Gangemi,et al.  Ontology Design Patterns for Semantic Web Content , 2005, SEMWEB.

[9]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[10]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[13]  Santosh S. Vempala,et al.  On clusterings: Good, bad and spectral , 2004, JACM.

[14]  Yuzhong Qu,et al.  Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets , 2014, SEMWEB.

[15]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Yuzhong Qu,et al.  Falcons: searching and browsing entities on the semantic web , 2008, WWW.

[19]  Vladimir Batagelj,et al.  Exploratory Social Network Analysis with Pajek , 2005 .

[20]  Francesca A. Lisi,et al.  Mining the Semantic Web: A Logic-Based Methodology , 2005, ISMIS.