Data Engineering in Graph Databases

Graph-structured databases have a wide range of emerging applications, e.g., the Semantic Web, eXtensible Markup Language (XML), biological databases and network topologies. To-date, there has already been voluminous real-world (possibly cyclic and schemaless) graph-structured data. Therefore, data engineering in graph-structured databases has recently received a lot of attention, where there are limitations as well as scope for significant developments. In these databases, there exist many different indexes and different query languages, e.g., XQuery, regular expressions, Web Ontology Langauge and subgraph isomorphism, while there are few graphical user interfaces for effectively querying subgraphs. In this paper, we examine and evaluate the current stateof- the-art in graph-structured databases with respect to (i) query languages, (ii) dynamic aspects, (iii) data mining, (iv) graphical user interfaces, and (v) modern computer architecture on graph-structured data. In addition, the incremental maintenance of graph indexes/views will be addressed

[1]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[2]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[3]  Byron Choi,et al.  Incremental Maintenance of 2-Hop Labeling of Large Graphs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[5]  John Scott Social Network Analysis , 1988 .

[6]  Ramesh K. Sitaraman,et al.  Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices , 2009, Proc. VLDB Endow..

[7]  Mong-Li Lee,et al.  A Prime Number Labeling Scheme for Dynamic Ordered XML Trees , 2004, ICDE.

[8]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[9]  Philip S. Yu,et al.  Fast Computation of Reachability Labeling for Large Graphs , 2006, EDBT.

[10]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[11]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[12]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[13]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[15]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[17]  Jianliang Xu,et al.  Lazy-Update B+-Tree for Flash Devices , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[18]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[19]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[20]  Ulf Leser,et al.  A query language for biological networks , 2005, ECCB/JBI.

[21]  Byron Choi,et al.  On incremental maintenance of 2-hop labeling of graphs , 2008, WWW.

[22]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[23]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[25]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[26]  Sourav S. Bhowmick,et al.  GBLENDER: towards blending visual query formulation and query processing in graph databases , 2010, SIGMOD Conference.

[27]  Sang-Won Lee,et al.  Design of flash-based DBMS: an in-page logging approach , 2007, SIGMOD '07.

[28]  Bingsheng He,et al.  A Quantitative Summary of XML Structures , 2006, ER.

[29]  Jian Li,et al.  On Discovering Community Trends in Social Networks , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[30]  Jiming Liu,et al.  Community Mining from Signed Social Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[31]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[32]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[34]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[35]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[36]  Jianzhong Li,et al.  A novel approach for efficient supergraph query processing on graph databases , 2009, EDBT '09.

[37]  Philip S. Yu,et al.  Graph indexing based on discriminative frequent structure analysis , 2005, TODS.

[38]  Jennifer Widom,et al.  Mining the space of graph properties , 2004, KDD.

[39]  Jeffrey Xu Yu,et al.  Optimizing updates of recursive XML views of relations , 2009, The VLDB Journal.

[40]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[41]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[42]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[43]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[44]  Dayou Liu,et al.  An autonomy-oriented computing approach to community mining in distributed and dynamic networks , 2010, Autonomous Agents and Multi-Agent Systems.

[45]  Philip S. Yu,et al.  GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[46]  Hiroshi Motoda,et al.  A Flash-Memory Based File System , 1995, USENIX.

[47]  Jennifer Widom,et al.  The Lowell database research self-assessment , 2003, CACM.

[48]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[49]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[50]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[51]  Bingsheng He,et al.  Tree Indexing on Flash Disks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[52]  Jianliang Xu,et al.  DigestJoin: Exploiting Fast Random Reads for Flash-Based Joins , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[53]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[54]  Goetz Graefe,et al.  Fast scans and joins using flash drives , 2008, DaMoN '08.

[55]  Gerhard Weikum,et al.  Efficient creation and incremental maintenance of the HOPI index for complex XML document collections , 2005, 21st International Conference on Data Engineering (ICDE'05).

[56]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[57]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[58]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[59]  Dan Suciu,et al.  UnQL: a query language and algebra for semistructured data based on structural recursion , 2000, The VLDB Journal.

[60]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[61]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.