A Graph Database of Yelp Dataset Challenge 2018 and Using Cypher for Basic Statistics and Graph Pattern Exploration

In this paper, we use Neo4j, a popular graph database, to store the Yelp Dataset for 2018 Challenge, which is a real-world dataset. The graph database provides persistent availability for users to retrieve data using Neo4j Graph Query Language called cypher, for many applications. Users can use Neo4j clients such as Python and R together with cypher and server plugins such as APOC and graph algorithm library to explore the dataset. We explain the basic concepts and applications of cypher graph pattern language. To demonstrate how the database can be used, we use cypher to obtain basic statistics of the dataset and use cypher with graph algorithm library to explore interesting graph patterns such as bipartite and connected components.

[1]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Yi-Cheng Zhang,et al.  Bipartite network projection and personal recommendation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Ying Fan,et al.  The effect of weight on community structure of networks , 2006, physics/0609218.

[4]  Marcelo Arenas,et al.  Foundations of Modern Query Languages for Graph Databases , 2016, ACM Comput. Surv..