Evaluating very large datalog queries on social networks

We consider a near future scenario in which users of a Web 2.0 application, such as a social network, contribute to the application not only data, but also rules which automatically query, utilize and create the data. For example, a user of a social network can define rules that automatically manage the user's friends list, the sending of various announcements, filtering of messages and more. We examine the probable case of automated addition of connections by a participant. The connections to be added are defined using a query, associated to each participant. For this, we introduce and study the Query Network model, a graph-based model in which every node models a network participant and is associated with a Datalog rule. The union of all these individual user rules constitutes a very large, recursive, Datalog program whose size is of the order of magnitude of the size of the data being queried (data whose size in a social network can easily exceed 1TB). This greatly differs from the traditional assumption that queries are small and data are large. In particular, traditional optimizers will be hard pressed to handle such queries. This is the case even if queries are 'translated' to SQL (using views) and their union is transformed to a very large SQL query. We have designed, built and experimented with evaluation algorithms for such query networks. Experiments with both synthetic and real datasets demonstrate the usefulness and high effectiveness of our methods. Extensions to the model are proposed, their implementation and testing are the subject of on-going work.

[1]  WeikumGerhard,et al.  Databases and Web 2.0 panel at VLDB 2007 , 2008 .

[2]  Mani Subramanian,et al.  Network Management , 1999 .

[3]  Oded Shmueli,et al.  Evaluation of datalog extended with an XPath predicate , 2007, WIDM '07.

[4]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[5]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[6]  Serge Abiteboul,et al.  Diagnosis of asynchronous discrete event systems: datalog to the rescue! , 2005, PODS.

[7]  Oded Shmueli,et al.  Using a relational processor and an XPath processor to evaluate joint queries , 2008, DataX '08.

[8]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[9]  M Girvan,et al.  Structure of growing social networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Tim Furche,et al.  Efficient evaluation of n-ary conjunctive queries over trees and graphs , 2006, WIDM '06.

[11]  Wolfgang May,et al.  XPath-logic and XPathLog: A logic-programming style XML data manipulation language , 2003, Theory and Practice of Logic Programming.

[12]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[13]  Yehoshua Sagiv,et al.  SQL4X: A Flexible Query Language for XML and Relational Databases , 2001, DBPL.

[14]  Shamim A. Naqvi,et al.  A Logical Language for Data and Knowledge Bases , 1989 .

[15]  Oded Shmueli,et al.  Conjunctive Queries over DAGs , 2006, NGITS.

[16]  Roger Frost,et al.  International Organization for Standardization (ISO) , 2004 .

[17]  Dan Suciu,et al.  Translating XSLT programs to Efficient SQL queries , 2002, WWW '02.

[18]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[19]  Carlo Zaniolo,et al.  The deductive database system [Lscr ][Dscr ][Lscr ]++ , 2002, Theory and Practice of Logic Programming.

[20]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[21]  K Sneppen,et al.  Modeling self-organization of communication and topology in social networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  David Maier,et al.  Magic sets and other strange ways to implement logic programs (extended abstract) , 1985, PODS '86.

[23]  Gustavo Alonso,et al.  Databases and Web 2.0 panel at VLDB 2007 , 2008, SGMD.

[24]  Georg Gottlob,et al.  The Elog Web Extraction Language , 2001, LPAR.

[25]  Beng Chin Ooi,et al.  The Claremont report on database research , 2008, SGMD.

[26]  Robin I. M. Dunbar Coevolution of neocortical size, group size and language in humans , 1993, Behavioral and Brain Sciences.