Discovering Informative Subgraphs in RDF Graphs

Discovering patterns in graphs has long been an area of interest. In most contemporary approaches to such pattern discovery either quantitative anomalies or frequency of substructure is used to measure the interestingness of a pattern. In this paper we address the issue of discovering informative subgraphs within RDF graphs. We motivate our work with an example related to Semantic Search. A user might pose a question of the form: “What are the most relevant ways in which entity X is related to entity Y?” the response to which is a subgraph connecting X to Y. Relevance of the discovered subgraph therefore will depend on the amount of useful information conveyed to the user. This in turn depends on the meaning of the edges in the subgraph. We introduce heuristics that guide a discovery algorithm away from banal paths towards more informative ones. This guidance is based on weighting mechanisms (driven by edge semantics) for the edges in the RDF graph. We present an analysis of the quality of the subgraphs generated with respect to path ranking metrics. We then conclude presenting intuitions about which of our weighting schemes and heuristics produce higher quality subgraphs.

[1]  Shou-De Lin,et al.  Unsupervised link discovery in multi-relational data via rarity analysis , 2003, Third IEEE International Conference on Data Mining.

[2]  Amit P. Sheth From Semantic Search & Integration to Analytics , 2005, Semantic Interoperability and Integration.

[3]  Krys J. Kochut,et al.  BRAHMS: A WorkBench RDF Store and High Performance Memory System for Semantic Association Discovery , 2005, SEMWEB.

[4]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  David Beckett The design and implementation of the Redland RDF application framework , 2002, Comput. Networks.

[6]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[7]  Brian McBride,et al.  Jena: Implementing the RDF Model and Syntax Specification , 2001, SemWeb.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Raphael Volz,et al.  A Comparison of RDF Query Languages , 2004, SEMWEB.

[10]  Steffen Staab,et al.  CREAM: CREAting Metadata for the Semantic Web , 2003, Comput. Networks.

[11]  André Valente,et al.  The KOJAK Group Finder: Connecting the Dots via Integrated Knowledge-Based and Statistical Reasoning , 2004, AAAI.

[12]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[13]  Amit P. Sheth,et al.  Semantic Web Technology in Support of Bioinformatics for Glycan Expression , 2004 .

[14]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[15]  Noah E. Friedkin,et al.  Theoretical Foundations for Centrality Measures , 1991, American Journal of Sociology.

[16]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[17]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[18]  James A. Hendler,et al.  The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities , 2001 .

[19]  Tina Eliassi-Rad,et al.  Using Ontological Information to Accelerate Path-Finding in Large Semantic Graphs: A Probabilistic Approach , 2005 .

[20]  Amit P. Sheth,et al.  Complex Relationships for the Semantic Web , 2003, Spinning the Semantic Web.

[21]  Amit P. Sheth,et al.  Semantic Content Management for Enterprises and the Web , 2001 .

[22]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[23]  Vipul Kashyap,et al.  Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships , 2004 .

[24]  Amit P. Sheth,et al.  SWETO: Large-Scale Semantic Web Test-bed , 2004 .

[25]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[26]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[27]  Amit P. Sheth,et al.  Semantic Association Identification and Knowledge Discovery for National Security Applications , 2005, J. Database Manag..

[28]  Gary William Flake,et al.  Self-organization of the web and identification of communities , 2002 .

[29]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[30]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[31]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[32]  Amit P. Sheth,et al.  Ranking complex relationships on the semantic Web , 2005, IEEE Internet Computing.

[33]  Vassilis Christophides,et al.  RQL: a declarative query language for RDF , 2002, WWW.

[34]  Ulrik Brandes,et al.  Centrality Measures Based on Current Flow , 2005, STACS.

[35]  Amit P. Sheth Enterprise Applications of Semantic Web: The Sweet Spot of Risk and Compliance , 2005, Industrial Applications of Semantic Web.

[36]  Hsinchun Chen,et al.  Criminal network analysis and visualization , 2005, CACM.

[37]  Sougata Mukherjea,et al.  BioPatentMiner: An Information Retrieval System for BioMedical Patents , 2004, VLDB.

[38]  Matthew Perry TOntoGen: A Synthetic Data Set Generator for Semantic Web Applications , 2005 .

[39]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[40]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[41]  Amit P. Sheth,et al.  Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content , 2002 .

[42]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[43]  Amit P. Sheth,et al.  Semantic Visualization: Interfaces for Exploring and Exploiting Ontology, Knowledgebase, Heterogeneous Content and Complex Relationships , 2004 .

[44]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[45]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .