A Quantitative Analysis of Student Solutions to Graph Database Problems

As data grow both in size and in connectivity, the interest to use graph databases in the industry has been proliferating. However, there has been little research on graph database education. In response to the need to introduce college students to graph databases, this paper is the first to analyze students' errors in homework submissions of queries written in Cypher, the query language for Neo4j---the most prominent graph database. Based on 40,093 student submissions from homework assignments in an upper-level computer science database course at one university, this paper provides a quantitative analysis of students' learning when solving graph database problems. The data shows that students struggle the most to correctly use Cypher's WITH clause to define variable names before referencing in the WHERE clause and these errors persist over multiple homework problems requiring the same techniques, and we suggest a further improvement on the classification of syntactic errors.

[1]  Christos Douligeris,et al.  Teaching the Basic Commands of NoSQL Databases Using Neo4j in Vocational Education and Training (VET) , 2019 .

[2]  Michael Kölling,et al.  Meaningful categorisation of novice programmer errors , 2014, 2014 IEEE Frontiers in Education Conference (FIE) Proceedings.

[3]  Vladimir Zadorozhny,et al.  Learning SQL Programming with Interactive Tools: From Integration to Personalization , 2010, TOCE.

[4]  Rabi Prasad Padhy,et al.  RDBMS to NoSQL: Reviewing Some Next-Generation Non-Relational Database's , 2011 .

[5]  Raymond Lister,et al.  Students' Syntactic Mistakes in Writing Seven Different Types of SQL Queries and its Application to Predicting Students' Success , 2016, SIGCSE.

[6]  Abdussalam Alawini,et al.  Insights from Student Solutions to MongoDB Homework Problems , 2021, ITiCSE.

[7]  Jorge Bernardino,et al.  Choosing the right NoSQL database for the job: a quality attribute evaluation , 2015, Journal of Big Data.

[8]  Jorge Bernardino,et al.  Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB , 2018, DATA.

[9]  Abdussalam Alawini,et al.  Insights from Student Solutions to SQL Homework Problems , 2020, ITiCSE.

[10]  Raymond Lister,et al.  A Quantitative Study of the Relative Difficulty for Novices of Writing Seven Different Types of SQL Queries , 2015, ITiCSE.

[11]  Craig Zilles,et al.  PrairieLearn: Mastery-based Online Problem Solving with Adaptive Scoring and Recommendations Driven by Machine Learning , 2015 .

[12]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[13]  Justin J. Miller,et al.  Graph Database Applications and Concepts with Neo4j , 2013 .

[14]  M. Blasgen Database Systems , 1982, Science.

[15]  Mike Buerli The Current State of Graph Databases , 2012 .

[16]  F. E.,et al.  A Relational Model of Data Large Shared Data Banks , 2000 .

[17]  Sriram Mohan,et al.  Teaching NoSQL Databases to Undergraduate Students: A Novel Approach , 2018, SIGCSE.

[18]  E. C. Milner THE THEORY OF GRAPHS AND ITS APPLICATIONS , 1964 .

[19]  Joy Godin,et al.  Teaching Case: Introduction to NoSQL in a Traditional Database Course , 2016, J. Inf. Syst. Educ..

[20]  Raghu Ramakrishnan,et al.  Data Management in the Cloud , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[21]  Yixin Chen,et al.  A comparison of a graph database and a relational database: a data provenance perspective , 2010, ACM SE '10.

[22]  Josep-Lluís Larriba-Pey,et al.  Introduction to Graph Databases , 2014, Reasoning Web.

[23]  Phyllis Reisner,et al.  Human Factors Studies of Database Query Languages: A Survey and Assessment , 1981, CSUR.

[24]  Raymond Lister,et al.  Students' Semantic Mistakes in Writing Seven Different Types of SQL Queries , 2016, ITiCSE.

[25]  Kavitha. Graph Analytics for Big Data , 2017 .