Teaching Data Science in a Statistical Curriculum: Can We Teach More by Teaching Less?
暂无分享,去创建一个
In many universities, Statistics departments have seen a surge of majors and concentrators (or minors) over the past decade. No one would argue that this trend has nothing to do with the increasing popularity of data-driven solutions in both private and public sectors. Statistics is considered being equivalent to (by some), a part of, or overlapping with (by most), Data Science. Most data scientists would also agree that Statistics is central to the foundation of these data-driven products. Students of Statistics, on one hand, regard themselves as having one foot inside data science already, while, on the other hand, experiencing confusion and frustration when they find themselves not as competitive in job interviews or hack-a-thons as their peers from computational sciences. One common “complaint” from statistics students is that they have not been equipped with the latest computational skills and knowledge about big data technologies. As educators, shall we cater to the needs of our students and start teaching them data science skills? Given the fact that data science technologies are ever changing, which “data science skills” shall we be teaching them?Which of our faculty can teach them these data science skills? We cannot have a curriculum for just about everything. The main question here is not “shall we teach them Python or Hadoop?” but “how shall we prepare our Statistics students for a career in Data Science as a Statistician?” Currently,Data Science is a fast-evolving field that represents, in many fields, a new approach of acquiring knowledge, collecting evidence, reasoning decisions, and making predictions, much of which is actually not new to Statistics. The process that produces a data science product can be viewed as a sequence of meticulously engineered decisions (or procedures) for data collection, data processing, data analysis, and result interpretation. While most of current data science efforts have been focused on how to implement these decisions to (or “cope with,” as Dr. Donoho described in his article) Big Data, Statistics primarily concerns about evaluating and improving the validity of these decisions. In his article, “50 Years of Data Science,” Dr. Donoho provided in-depth retrospectives and perspectives on data science, especially in relation to Statistics. He reviewed the evolution of data science as a “science of learning from data.” This definition of data science focuses on what we hope to accomplish and advance, rather than on what we use or study. It also well explains why data science, as a field, while being supported by a set of fundamental principles, evolves quickly with current data collection and processing technology. Many of these
[1] Bin Yu,et al. Ten Simple Rules for Effective Statistical Practice , 2016, PLoS Comput. Biol..