Query Operators for Comparing Uncertain Graphs

Extending graph models to incorporate uncertainty is important for many applications, including citation networks, disease transmission networks, social networks, and observational networks. These networks may have existence probabilities associated with nodes or edges, as well as probabilities associated with attribute values of nodes or edges. Comparison of graphs and subgraphs is challenging without probabilities. When considering uncertainty of different graph elements and attributes, traditional graph operators and semantics are insufficient. In this paper, we present a prototype SQL-like graph query language that focuses on operators for querying and comparing uncertain graphs and subgraphs. Two interesting operators include ego neighborhood similarity and semantic path similarity. Similarity operators are particularly useful for comparison queries, the focus of this paper. After motivating and describing our operators, we present an implementation of a query engine that uses this query language. This implementation combines a layered and service-oriented architecture and is designed to be extensible, so that simple operators can be used as building blocks for more complex ones. We demonstrate the utility of our query language and operators for analyzing uncertain graphs based on two real world networks, a dolphin observation network and a citation network. Finally, we conduct a performance evaluation of some of the more complex operators, illustrating the viability of these operators for analysis of larger graphs.

[1]  J. Mann,et al.  Why Do Dolphins Carry Sponges? , 2008, PloS one.

[2]  Haixun Wang,et al.  Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases , 2012, Proc. VLDB Endow..

[3]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[4]  Lise Getoor,et al.  Subgraph pattern matching over uncertain graphs with identity linkage uncertainty , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Hong Cheng,et al.  Finding top-k similar graphs in graph databases , 2012, EDBT '12.

[6]  Dimitrios Skoutas,et al.  Efficient discovery of frequent subgraph patterns in uncertain graph databases , 2011, EDBT/ICDT '11.

[7]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[8]  Per Berggren,et al.  Sponge Carrying by Dolphins (Delphinidae, Tursiops sp.): A Foraging Specialization Involving Tool Use? , 2010 .

[9]  Lisa Singh,et al.  G-PARE: A visual analytic tool for comparative analysis of uncertain graphs , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[10]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[11]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[12]  Haixun Wang,et al.  Efficient Keyword Search on Uncertain Graph Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jianzhong Li,et al.  Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics , 2010, KDD.

[14]  Elisa J. Bienenstock,et al.  Social networks reveal cultural behaviour in tool-using dolphins , 2012, Nature Communications.

[15]  Scott Fortin The Graph Isomorphism Problem , 1996 .

[16]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[17]  Lei Chen,et al.  Efficiently Answering Probability Threshold-Based Shortest Path Queries over Uncertain Graphs , 2010, DASFAA.

[18]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[19]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[20]  Lisa Singh,et al.  Comparison Queries for Uncertain Graphs , 2013, DEXA.

[21]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[22]  Susanne E. Hambrusch,et al.  Orion 2.0: native support for uncertain data , 2008, SIGMOD Conference.

[23]  Hao Zhou,et al.  Querying graphs with uncertain predicates , 2010, MLG '10.

[24]  Haixun Wang,et al.  Distance-Constraint Reachability Computation in Uncertain Graphs , 2011, Proc. VLDB Endow..

[25]  Christoph E. Koch MayBMS: A System for Managing Large Uncertain and Probabilistic Databases , 2009 .

[26]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[27]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[28]  Lise Getoor,et al.  PrDB: managing and exploiting rich correlations in probabilistic databases , 2009, The VLDB Journal.

[29]  Lisa Singh,et al.  A process-centric data mining and visual analytic tool for exploring complex social networks , 2013, IDEA@KDD.

[30]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[31]  Jianzhong Li,et al.  Finding top-k maximal cliques in an uncertain graph , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[32]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[33]  Lisa Singh,et al.  Visual Mining of Multi-Modal Social Networks at Different Abstraction Levels , 2007, 2007 11th International Conference Information Visualization (IV '07).

[34]  Ralf Hartmut Güting,et al.  GraphDB: Modeling and Querying Graphs in Databases , 1994, VLDB.

[35]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2013 .

[36]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[37]  Charu C. Aggarwal,et al.  Discovering highly reliable subgraphs in uncertain graphs , 2011, KDD.

[38]  Lisa Singh,et al.  Visualizing node attribute uncertainty in graphs , 2011, Electronic Imaging.