A Generic Framework for Engaging Online Data Sources in Introductory Programming Courses

This paper presents work on a code framework and methodology to facilitate the introduction of large, real-time, online data sources into introductory (or advanced) Computer Science courses. The framework is generic in the sense that no prior scaffolding or template specification is needed to make the data accessible, as long as the source uses a standard format such as XML, CSV, or JSON. The implementation described here maintains minimal syntactic overhead while relieving novice programmers from low-level issues of parsing raw data from a web-based data source. It interfaces directly with data structures and representations defined by the students themselves, rather than predefined and supplied by the library. Together, these features allow students and instructors to focus on algorithmic aspects of processing a wide variety of live and large data sources, without having to deal with low-level connection, parsing, extraction, and data binding. The library, available at http://cs.berry.edu/big-data, has been used in an introductory programming course based on Processing.