Robust keyword search in large attributed graphs

There is a growing need to explore attributed graphs such as social networks, expert networks, and biological networks. A well-known mechanism for non-technical users to explore such graphs is keyword search, which receives a set of query keywords and returns a connected subgraph that contains the keywords. However, existing approaches, such as methods based on shortest paths between nodes containing the query keywords, may generate weakly-connected answers as they ignore the structure of the whole graph. To address this issue, we formulate and solve the robust keyword search problem for attributed graphs to find strongly-connected answers. We prove that the problem is NP-hard and we propose a solution based on a random walk with restart (RWR). To improve the efficiency and scalability of RWR, we use Monte Carlo approximation and we also propose a distributed version, which we implement in Apache Spark. Finally, we provide experimental evidence of the efficiency and effectiveness of our approach on real-world graphs.

[1]  L. Duret,et al.  Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Peng Shi,et al.  Strategic Social Team Crowdsourcing: Forming a Team of Truthful Workers for Crowdsourcing in Social Networks , 2019, IEEE Transactions on Mobile Computing.

[4]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[5]  Samik Datta,et al.  Capacitated team formation problem on social networks , 2012, KDD.

[6]  Jure Leskovec,et al.  Mining of Massive Datasets, 2nd Ed , 2014 .

[7]  Yin Yang,et al.  Keyword search on relational data streams , 2007, SIGMOD '07.

[8]  Anthony K. H. Tung,et al.  An Efficient Parallel Keyword Search Engine on Knowledge Graphs , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[9]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[10]  Marco Punta,et al.  PROTEIN INTERACTIONS AND DISEASE , 2007 .

[11]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  S. Jay Protein silencing to stop a “silent killer” , 2019, Science Translational Medicine.

[13]  Fan Wu,et al.  Social Connection Aware Team Formation for Participatory Tasks , 2018, IEEE Access.

[14]  Marianne Winslett,et al.  Using structural information in XML keyword search effectively , 2011, TODS.

[15]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[16]  M. Kann,et al.  PROTEIN INTERACTIONS AND DISEASE , 2006 .

[17]  Luca Becchetti,et al.  Online team formation in social networks , 2012, WWW.

[18]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[19]  Kevin Chen-Chuan Chang,et al.  Learning Community Embedding with Community Detection and Node Embedding on Graphs , 2017, CIKM.

[20]  Laks V. S. Lakshmanan,et al.  Attribute-Driven Community Search , 2016, Proc. VLDB Endow..

[21]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[22]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[23]  Ni Lao,et al.  Fast query execution for retrieval models based on path-constrained random walks , 2010, KDD.

[24]  Haixun Wang,et al.  Local search of communities in large graphs , 2014, SIGMOD Conference.

[25]  Lukasz Golab,et al.  eGraphSearch: Effective Keyword Search in Graphs , 2016, CIKM.

[26]  Xiaohui Yu,et al.  Efficient Duplication Free and Minimal Keyword Search in Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[27]  Pär K Ingvarsson,et al.  Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. , 2006, Molecular biology and evolution.

[28]  AnAijun,et al.  Keyword search in graphs , 2011, VLDB 2011.

[29]  Yike Guo,et al.  A novel community detection algorithm based on simplification of complex networks , 2017, Knowl. Based Syst..

[30]  Maricel G. Kann,et al.  Chapter 4: Protein Interactions and Disease , 2012, PLoS Comput. Biol..

[31]  Aijun An,et al.  Keyword Search in Graphs: Finding r-cliques , 2011, Proc. VLDB Endow..

[32]  Ling Liu,et al.  Output privacy in data mining , 2011, TODS.

[33]  Feifei Li,et al.  Scalable Keyword Search on Large RDF Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[34]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[35]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[36]  Ashish Goel,et al.  Fast Incremental and Personalized PageRank , 2010, Proc. VLDB Endow..

[37]  Aijun An,et al.  Finding Affordable and Collaborative Teams from a Network of Experts , 2013, SDM.

[38]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Keyword Search on Graph Data , 2010, Managing and Mining Graph Data.

[39]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[40]  Philip S. Yu,et al.  Enterprise Employee Training via Project Team Formation , 2017, WSDM.

[41]  Gerhard Weikum,et al.  STAR: Steiner-Tree Approximation in Relationship Graphs , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[42]  Dongyan Zhao,et al.  Keyword Search on RDF Graphs - A Query Graph Assembly Approach , 2017, CIKM.

[43]  Jeffrey Xu Yu,et al.  Influential Community Search in Large Networks , 2015, Proc. VLDB Endow..

[44]  Hadi Zare,et al.  IEDC: An integrated approach for overlapping and non-overlapping community detection , 2016, Knowl. Based Syst..

[45]  Reynold Cheng,et al.  Effective Community Search for Large Attributed Graphs , 2016, Proc. VLDB Endow..

[46]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[47]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[48]  Young-Sik Jeong,et al.  A parallel team formation approach using crowd intelligence from social network , 2019, Comput. Hum. Behav..