Analyzing graph databases by aggregate queries

An important step in data analysis is the exploration of data. For traditional relational databases one of the most powerful tools for performing such analysis is the relational database and the aggregates and rankings that they can compute: for instance, simple statistics such as the average number of links between two types of entities (relations) are easily computed using a query on a relational database and may already provide valuable information. However, for the exploration of graph data, relational databases may not be most practical and scalable. For instance, a statistic such as the shortest path between two given nodes cannot be computed by a relational database. Surprisingly, however, tools for querying graph and network databases are much less well developed than for relational data, and only recently an increasing number of studies are devoted to graph or network databases. Our position is that the development of such graph databases is important both to make basic graph mining easier and to prepare data for more complex types of analysis. An important component of such databases is the language that is used to enable aggregating queries, such as shortest path queries. In this paper, we propose an extension to a previously proposed query language. This extension allows for querying and analyzing databases by using aggregates and ranking. A notable feature of our language is that it also supports probabilistic graph queries by conceiving of such queries as aggregating queries. We demonstrate its value on a simple data analysis task.

[1]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[2]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[3]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[4]  Stefan Kramer,et al.  SINDBAD and SiQL: An Inductive Database and Query Language in the Relational Model , 2008, ECML/PKDD.

[5]  Ulf Leser,et al.  A query language for biological networks , 2005, ECCB/JBI.

[6]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[7]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[8]  Gultekin Özsoyoglu,et al.  A graph query language and its query processing , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  RalfHiutmut Gtiting,et al.  GraphDB : Modeling and Querying Graphs in Databases , 1998 .

[10]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[11]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[12]  Michel Scholl,et al.  Gram: a graph data model and query languages , 1992, ECHT '92.

[13]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[14]  Mark Levene,et al.  The hypernode model and its associated query language , 1990, Proceedings of the 5th Jerusalem Conference on Information Technology, 1990. 'Next Decade in Information Technology'.

[15]  Jan Hidders,et al.  Typing Graph-Manipulation Operations , 2003, ICDT.

[16]  Mark Levene,et al.  An object-oriented data model formalised through hypergraphs , 1991, Data Knowl. Eng..

[17]  Marc Gyssens,et al.  A graph-oriented object database model , 1990, IEEE Trans. Knowl. Data Eng..

[18]  Luc De Raedt,et al.  ProbLog: A Probabilistic Prolog and its Application in Link Discovery , 2007, IJCAI.

[19]  Luc De Raedt,et al.  A query language for analyzing networks , 2009, CIKM.

[20]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[21]  Josep-Lluís Larriba-Pey,et al.  Dex: high-performance exploration on large graphs for information retrieval , 2007, CIKM '07.