A Fresh Look at Introductory Data Science

The proliferation of vast quantities of available datasets that are large and complex in nature has challenged universities to keep up with the demand for graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data. To keep up with this demand, attracting students early on to data science as well as providing them a solid foray into the field becomes increasingly important. We present a case study of an introductory undergraduate course in data science that is designed to address these needs. Offered at Duke University, this course has no pre-requisites and serves a wide audience of aspiring statistics and data science majors as well as humanities, social sciences, and natural sciences students. We discuss the unique set of challenges posed by offering such a course and in light of these challenges, we present a detailed discussion into the pedagogical design elements, content, structure, computational infrastructure, and the assessment methodology of the course. We also offer a repository containing all teaching materials that are open-source, along with supplemental materials and the R code for reproducing the figures found in the paper.

[1]  Mine Çetinkaya-Rundel,et al.  From Drab to Fab: Teaching Visualization via Incremental Improvements , 2020 .

[2]  Jennifer Bryan,et al.  Excuse Me, Do You Have a Moment to Talk About Version Control? , 2018, PeerJ Prepr..

[3]  Ricky J. Sethi,et al.  Curriculum Guidelines for Undergraduate Programs in Data Science , 2017, 1801.06814.

[4]  Megan Mocko,et al.  Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report 2016 , 2016 .

[5]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[6]  Rafael A. Irizarry,et al.  A Guide to Teaching Data Science , 2016, The American statistician.

[7]  A. Nowak‐Wegrzyn,et al.  Let them eat cake. , 2012, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[8]  L. Michaelsen,et al.  Team-Based Learning , 2011 .

[9]  Telecommunications Board,et al.  Data Science for Undergraduates , 2018 .

[10]  David Robinson,et al.  Convert Statistical Objects into Tidy Tibbles [R package broom version 0.7.1] , 2020 .

[11]  Hadley Wickham,et al.  R for Data Science: Import, Tidy, Transform, Visualize, and Model Data , 2014 .

[12]  Jj Allaire,et al.  Web Application Framework for R , 2016 .

[13]  Yihui Xie,et al.  R Markdown , 2018 .

[14]  Xiaofei Wang,et al.  Data Visualization on Day One: Bringing Big Ideas into Intro Stats Early and Often , 2017 .

[15]  J. Winquist,et al.  Flipped Statistics Class Results: Better Performance Than Lecture Over One Year Later , 2014 .

[16]  Ben Baumer,et al.  R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics , 2014, 1402.1894.

[17]  Joe Cheng,et al.  Web Application Framework for R [R package shiny version 1.5.0] , 2020 .

[18]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[19]  Hadley Wickham,et al.  Easily Install and Load the 'Tidymodels' Packages [R package tidymodels version 0.1.2] , 2020 .

[20]  Nicholas J. Horton,et al.  Implementing version control with Git as a learning objective in statistics courses , 2020 .

[21]  Edward J. Kim,et al.  Teaching Data Science , 2016, ICCS.

[22]  Darina Dicheva,et al.  Towards Data Science Literacy , 2017, ICCS.

[23]  David Robinson,et al.  tidytext: Text Mining and Analysis Using Tidy Data Principles in R , 2016, J. Open Source Softw..

[24]  Ben Baumer,et al.  A Data Science Course for Undergraduates: Thinking With Data , 2015, ArXiv.

[25]  Lillian N. Cassel,et al.  ACM Task Force on Data Science Education: Draft Report and Opportunity for Feedback , 2019, SIGCSE.

[26]  Deborah Nolan,et al.  Computing in the Statistics Curricula , 2010 .

[27]  Joint Task Force on Computing Curricula Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science , 2013 .

[28]  E. Schussler,et al.  Student anxiety in introductory biology classrooms: Perceptions about active learning and persistence in the major , 2017, PloS one.

[29]  Jacob Fiksel,et al.  Using GitHub Classroom To Teach Statistics , 2018, Journal of Statistics Education.

[30]  Mine Çetinkaya-Rundel,et al.  Infrastructure and Tools for Teaching Computing Throughout the Statistical Curriculum , 2018, PeerJ Prepr..

[31]  Nicholas J. Horton,et al.  Data Science in Statistics Curricula: Preparing Students to “Think with Data” , 2014, 1410.3127.