Introduction to Graph Databases

The use of graphs in analytic environments is getting more and more widespread, with applications in many different environments like social network analysis, fraud detection, industrial management, knowledge analysis, etc. Graph databases are one important solution to consider in the management of large datasets. The course will be oriented to tackle four important aspects of graph management. First, to give a characterization of graphs and the most common operations applied on them. Second, to review the technologies for graph management and focus on the particular case of Sparksee. Third, to analyze in depth some important applications and how graphs are used to solve them. Fourth, to understand the use of benchmarking to make the requirements of the user compatible with the growth of the technologies for graph management.

[1]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[2]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[3]  Mark Levene,et al.  The Hypernode Model: A Graph-Theoretic Approach to Integrating Data and Computation , 1989, FMLDO.

[4]  Josep-Lluís Larriba-Pey,et al.  High quality, scalable and parallel community detection for large real graphs , 2014, WWW.

[5]  Hector Garcia-Molina,et al.  Generic entity resolution with negative rules , 2009, The VLDB Journal.

[6]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[8]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[9]  Peter Christen,et al.  A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[11]  Josep-Lluís Larriba-Pey,et al.  A Discussion on the Design of Graph Database Benchmarks , 2010, TPCTC.

[12]  Josep-Lluís Larriba-Pey,et al.  Efficient graph management based on bitmap indices , 2012, IDEAS '12.

[13]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[14]  Josep-Lluís Larriba-Pey,et al.  Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark , 2010, WAIM Workshops.

[15]  Ladislav Hluchý,et al.  Benchmarking Traversal Operations over Graph Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[16]  Abraham Bernstein,et al.  Signal/Collect: Graph Algorithms for the (Semantic) Web , 2010, SEMWEB.

[17]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[18]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[19]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[20]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[21]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[22]  Martin Neumann,et al.  Partitioning Graph Databases - A Quantitative Evaluation , 2013, ArXiv.

[23]  R. G. G. Cattell,et al.  Object operations benchmark , 1992, TODS.

[24]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Hector Garcia-Molina,et al.  Entity resolution with evolving rules , 2010, Proc. VLDB Endow..

[26]  Ben Goertzel,et al.  OpenCogPrime: A cognitive synergy based architecture for artificial general intelligence , 2009, 2009 8th IEEE International Conference on Cognitive Informatics.

[27]  Nilesh N. Dalvi,et al.  Large-Scale Collective Entity Matching , 2011, Proc. VLDB Endow..

[28]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[29]  Pável Calado,et al.  Structure-based inference of xml similarity for fuzzy duplicate detection , 2007, CIKM '07.

[30]  Josep-Lluís Larriba-Pey,et al.  Dex: high-performance exploration on large graphs for information retrieval , 2007, CIKM '07.

[31]  Andreas Thor,et al.  MOMA - A Mapping-based Object Matching System , 2007, CIDR.

[32]  Michael F. Schwartz,et al.  Discovering shared interests using graph analysis , 1993, CACM.

[33]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[34]  Dan Grossman,et al.  Crunching Large Graphs with Commodity Processors , 2011, HotPar.