Distributed Graph Database for Large-Scale Social Computing

We present an efficient distributed graph database architecture for large scale social computing. The architecture consists of a distributed graph data processing system and a distributed graph data storage system. We leverage the advantages of both systems to achieve efficient social computing. We conduct extensive experiments to demonstrate the performance of our system. We employ four real-world, large scale social networks - YouTube, Flicker, LiveJournal and Orkut as test data. We also implement several representative social applications and graph algorithms to examine the performance of our system. We employ two main optimization techniques in our system ¡Vindexing and graph partitioning. Experimental results indicate that our system outperforms GoldenOrb, an implementation Pregel model from Google.

[1]  Ricardo Baeza-Yates,et al.  Data challenges at Yahoo! , 2008, EDBT '08.

[2]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[3]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[4]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[5]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[6]  Yifei Yuan,et al.  Scalable Influence Maximization in Social Networks under the Linear Threshold Model , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Masao Doi,et al.  Variational principle for the Kirkwood theory for the dynamics of polymer solutions and suspensions , 1983 .

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  N. Goel,et al.  Stochastic models in biology , 1975 .

[10]  冯利芳 Facebook , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[11]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[12]  Yixin Chen,et al.  A comparison of a graph database and a relational database: a data provenance perspective , 2010, ACM SE '10.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Ashwin Machanavajjhala,et al.  Feed following: the big data challenge in social applications , 2011, DBSocial '11.

[15]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[16]  Vipin Kumar,et al.  Multilevel Algorithms for Multi-Constraint Graph Partitioning , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[17]  Sidney Redner,et al.  A guide to first-passage processes , 2001 .

[18]  S. Redner A guide to first-passage processes , 2001 .

[19]  Christopher Scherb The Hadoop File System , 2012 .