A Randomized Controlled Trial on the Wild Wild West of Scientific Computing with Student Learners

Scientific computing has become an area of growing importance. Across fields such as biology, education, physics, or others, people are increasingly using scientific computing to model and understand the world around them. Despite the clear need, almost no systematic analysis has been conducted on how students in fields outside of computer science learn to program in the context of scientific computing. Given that many fields do not explicitly teach much programming to their students, they may have to learn this important skill on their own. To help, using rigorous quantitative and qualitative methods, we looked at the process 154 students followed in the context of a randomized controlled trial on alternative styles of programming that can be used in R. Our results suggest that the barriers students face in scientific computing are non-trivial and this work has two core implications: 1) students learning scientific computing on their own struggle significantly in many different ways, even if they have had prior programming training, and 2) the design of the current generation of scientific computing feels like the wild-wild west and the designs can be improved in ways we will enumerate.

[1]  Daniel T. Kaplan,et al.  Modern Data Science with R , 2017 .

[2]  Antti-Juhani Kaijanaho,et al.  Evidence-based programming language design : a philosophical and methodological exploration , 2015 .

[3]  Deborah Nolan,et al.  Teaching and Learning Data Visualization: Ideas and Assignments , 2015, 1503.00781.

[4]  Alan F. Blackwell,et al.  First steps in programming: a rationale for attention investment models , 2002, Proceedings IEEE 2002 Symposia on Human Centric Computing Languages and Environments.

[5]  Tom Schorsch,et al.  CAP: an automated self-assessment tool to check Pascal programs for syntax, logic and style errors , 1995 .

[6]  Andreas Stefik,et al.  An empirical comparison of the accuracy rates of novices using the quorum, perl, and randomo programming languages , 2011, PLATEAU '11.

[7]  Tobias Kohn,et al.  The Error Behind The Message: Finding the Cause of Error Messages in Python , 2019, SIGCSE.

[8]  Andrew Luxton-Reilly,et al.  Enhancing syntax error messages appears ineffectual , 2014, ITiCSE '14.

[9]  David Weintrop,et al.  Using Commutative Assessments to Compare Conceptual Understanding in Blocks-based and Text-based Programs , 2015, ICER.

[10]  Mine Çetinkaya-Rundel,et al.  Infrastructure and Tools for Teaching Computing Throughout the Statistical Curriculum , 2018, PeerJ Prepr..

[11]  Ben Baumer,et al.  R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics , 2014, 1402.1894.

[12]  Kp Suresh An overview of randomization techniques: An unbiased assessment of outcome in clinical research , 2011, Journal of human reproductive sciences.

[13]  Christopher D. Hundhausen,et al.  Can direct manipulation lower the barriers to computer programming and promote transfer of training?: An experimental study , 2009, TCHI.

[14]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[15]  Andreas Stefik,et al.  An Empirical Investigation into Programming Language Syntax , 2013, TOCE.

[16]  D. Donoho 50 Years of Data Science , 2017 .

[17]  Michael Kölling,et al.  Meaningful categorisation of novice programmer errors , 2014, 2014 IEEE Frontiers in Education Conference (FIE) Proceedings.

[18]  John Homer,et al.  Metacognitive Difficulties Faced by Novice Programmers in Automated Assessment Tools , 2018, ICER.

[19]  Brett A. Becker An Effective Approach to Enhancing Compiler Error Messages , 2016, SIGCSE.

[20]  Stefan Hanenberg,et al.  An empirical investigation of the effects of type systems and code completion on API usability using TypeScript and JavaScript in MS visual studio , 2015, DLS.

[21]  Michael Philippsen,et al.  A controlled experiment on inheritance depth as a cost factor for code maintenance , 2003, J. Syst. Softw..

[22]  Rebecca Smith,et al.  The Error Landscape: Characterizing the Mistakes of Novice Programmers , 2019, SIGCSE.

[23]  Neil Brown,et al.  Blackbox: a large scale repository of novice programmers' activity , 2014, SIGCSE.

[24]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[25]  Brett A. Becker,et al.  Categorizing Compiler Error Messages with Principal Component Analysis , 2016 .

[26]  Emmett Witchel,et al.  Is transactional programming actually easier? , 2010, PPoPP '10.

[27]  Amy J. Ko,et al.  Barriers Faced by Coding Bootcamp Students , 2017, ICER.

[28]  Neil Brown,et al.  Novice Java Programming Mistakes , 2017, ACM Trans. Comput. Educ..

[29]  Jihan Hendley Every Student Succeeds Act (ESSA) , 2017 .

[30]  Victor Pankratius,et al.  Software Engineering with Transactional Memory Versus Locks in Practice , 2013, Theory of Computing Systems.

[31]  Megan R. Sapp Nelson,et al.  Integrating Data Science Tools into a Graduate Level Data Management Course , 2018 .

[32]  Daniel T. Kaplan,et al.  Teaching Stats for Data Science , 2018, PeerJ Prepr..

[33]  Kurt Hornik,et al.  The Comprehensive R Archive Network , 2012 .

[34]  Rebecca Reznik-Zellen,et al.  Using the Visualization Software Evaluation Rubric to explore six freely available visualization applications , 2018 .

[35]  Stefan Hanenberg,et al.  How do API documentation and static typing affect API usability? , 2014, ICSE.

[36]  Kyle Thayer Using Program Analysis to Improve API Learnability , 2018, 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[37]  Stefan Hanenberg,et al.  An Empirical Study on the Impact of C++ Lambdas and Programmer Experience , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[38]  Diana Franklin,et al.  Initialization in Scratch: Seeking Knowledge Transfer , 2016, SIGCSE.