This paper presents work on a code framework and methodology to facilitate the introduction of large, real-time, online data sources into introductory (or advanced) Computer Science courses. The framework is generic in the sense that no prior scaffolding or template specification is needed to make the data accessible, as long as the source uses a standard format such as XML, CSV, or JSON. The implementation described here maintains minimal syntactic overhead while relieving novice programmers from low-level issues of parsing raw data from a web-based data source. It interfaces directly with data structures and representations defined by the students themselves, rather than predefined and supplied by the library. Together, these features allow students and instructors to focus on algorithmic aspects of processing a wide variety of live and large data sources, without having to deal with low-level connection, parsing, extraction, and data binding. The library, available at http://cs.berry.edu/big-data, has been used in an introductory programming course based on Processing.
[1]
Frank Neven,et al.
Inferring XML Schema Definitions from XML Data
,
2007,
VLDB.
[2]
Kyuseok Shim,et al.
XTRACT: Learning Document Type Descriptors from XML Document Collections
,
2004,
Data Mining and Knowledge Discovery.
[3]
Clifford A. Shaffer,et al.
Transforming introductory computer science projects via real-time web data
,
2014,
SIGCSE.
[4]
Austin Cory Bart.
Situating Computational Thinking with Big Data: Pedagogy and Technology (Abstract Only)
,
2015,
SIGCSE.
[5]
Douglas C. Schmidt,et al.
Reducing application code complexity with vocabulary-specific XML language bindings
,
2005,
ACM-SE 43.
[6]
Michael D. Ernst,et al.
A Data Programming CS1 Course
,
2015,
SIGCSE.
[7]
Thomas Schwentick,et al.
Inference of concise DTDs from XML data
,
2006,
VLDB.