Teaching Future Big Data Analysts: Curriculum and Experience Report

This paper documents the learning objectives, curriculum design, technology infrastructure, and classroom experience for a "big data mining and analytics" course at a small liberal arts college. The course serves as an elective for our Data Analytics minor as well as an elective for computer science and computer information systems majors. The course introduces students to data analysis, statistics, and plotting with Unix tools and the R language. It then transitions into big data projects making use of Apache Hadoop, HDFS, and Map-Reduce; Apache Spark; Apache Hive; and related tools. A primary learning objective is that students demonstrate the ability to identify which tools are most appropriate for specific datasets and data analysis tasks. We also expect students to be able to communicate their findings to a general audience. As potential future data analysts, we aim to give students the skills and sensibility to efficiently solve data analysis problems, big data or otherwise, in their future careers.

[1]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[2]  R. J. Wainscoat,et al.  The Pan-STARRS1 Database and Data Products , 2016, The Astrophysical Journal Supplement Series.

[3]  Suzanne W. Dietrich,et al.  Integrating big data into the computing curricula , 2014, SIGCSE.

[4]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[5]  Randy H. Katz,et al.  Experiences teaching MapReduce in the cloud , 2012, SIGCSE '12.

[6]  Thomas J. Steenburgh,et al.  Motivating Salespeople: What Really Works , 2012, Harvard business review.

[7]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[8]  Jeffrey Heer,et al.  Interactive analysis of big data , 2012, XRDS.

[9]  Edward B. Duffy,et al.  Teaching HDFS/MapReduce Systems Concepts to Undergraduates , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[10]  Joshua Eckroth,et al.  Teaching Big Data with a Virtual Cluster , 2016, SIGCSE.

[11]  Ciprian Dobre,et al.  Parallel Programming Paradigms and Frameworks in Big Data Era , 2013, International Journal of Parallel Programming.

[12]  John H. Schuh The Integrated Postsecondary Education Data System , 2002 .

[13]  T. Davenport,et al.  Data scientist: the sexiest job of the 21st century. , 2012, Harvard business review.