Fast Best-Effort Search on Graphs with Multiple Attributes

We address the problem of search on graphs with multiple nodal attributes. We call such graphs weighted attribute graphs (WAGs). Nodes of a WAG exhibit multiple attributes with varying, non-negative weights. WAGs are ubiquitous in real-world applications. For example, in a co-authorship WAG, each author is a node; each attribute corresponds to a particular topic (e.g., databases, data mining, and machine learning); and the amount of expertise in a particular topic is represented by a non-negative weight on that attribute. A typical search in this setting specifies both connectivity between nodes and constraints on weights of nodal attributes. For example, a user's search may be: find three coauthors (i.e., a triangle) where each author's expertise is greater than 50 percent in at least one topic area (i.e., attribute). We propose a ranking function which unifies ranking between the graph structure and attribute weights of nodes. We prove that the problem of retrieving the optimal answer for graph search on WAGs is NP-complete. Moreover, we propose a fast and effective top-k graph search algorithm for WAGs. In an extensive experimental study with multiple real-world graphs, our proposed algorithm exhibits significant speed-up over competing approaches. On average, our proposed method is more than 7χ faster in query processing than the best competitor.

[1]  Inder Jeet Taneja,et al.  On Generalized Information Measures and Their Applications , 1989 .

[2]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[3]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[4]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[5]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[6]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[7]  Lei Zou,et al.  Top-k subgraph matching query in a large graph , 2007, PIKM '07.

[8]  Ruoming Jin,et al.  Topic level expertise search over heterogeneous networks , 2010, Machine Learning.

[9]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[10]  Krishna P. Gummadi,et al.  Cognos: crowdsourcing search for topic experts in microblogs , 2012, SIGIR '12.

[11]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[14]  Jiawei Han,et al.  Graph cube: on warehousing and OLAP multidimensional networks , 2011, SIGMOD '11.

[15]  Amol Deshpande,et al.  Indexing correlated probabilistic databases , 2009, SIGMOD Conference.

[16]  Jeffrey Xu Yu,et al.  High efficiency and quality: large graphs matching , 2011, CIKM.

[17]  Daniele Quercia,et al.  The personality of popular facebook users , 2012, CSCW.

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[19]  Philip S. Yu,et al.  Fast Graph Pattern Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[20]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[21]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[22]  Spiros Papadimitriou,et al.  Fast Best-Effort Search on Graphs with Multiple Attributes , 2015, IEEE Trans. Knowl. Data Eng..

[23]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[24]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[26]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[27]  Jimeng Sun,et al.  Patent partner recommendation in enterprise social networks , 2013, WSDM.

[28]  Ji-Rong Wen,et al.  Scalable community discovery on textual data with relations , 2008, CIKM '08.

[29]  Ian Davidson,et al.  Guided learning for role discovery (GLRD): framework, algorithms, and applications , 2013, KDD.

[30]  Lei Zou,et al.  DistanceJoin: Pattern Match Query In a Large Graph Database , 2009, Proc. VLDB Endow..

[31]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Bo Gao,et al.  Topic-level social network search , 2011, KDD.

[33]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[34]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[35]  Shijie Zhang,et al.  DELTA: indexing and querying multi-labeled graphs , 2011, CIKM '11.

[36]  Philip S. Yu,et al.  CP-index: on the efficient indexing of large graphs , 2011, CIKM '11.

[37]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[38]  Daniele Quercia,et al.  The Social World of Twitter: Topics, Geography, and Emotions , 2012, ICWSM.

[39]  Jeffrey Xu Yu,et al.  Top-K Graph Pattern Matching: A Twig Query Approach , 2012, WAIM.

[40]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[41]  Brian Gallagher,et al.  Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching , 2006, AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection.