INTRODUCTION The Netflix Prize (Bennett & Lanning, 2007) was a one million dollar prize offered by Netflix to the team able to reduce the root mean square error in its recommender system by 10%. The excitement generated by the prize is reflected by the announcement of the winner on the front page of the New York Times (Lohr, 2009). Over a three-year period, the team "BellKor's Pragmatic Chaos" was able to beat over 50,000 competitors from around the world using data mining algorithms. This paper describes a competition inspired by the Netflix Prize for a graduate information decision management course taught in winter 2012. A dataset of 276 monthly observations of Catalina Island cross-channel visitors was obtained from the Catalina Island Chamber of Commerce & Visitors Bureau. Because students were encouraged to use the statistical program JMP (2012), we called the competition the "JMP to Catalina Island Competition." The competition was used to illustrate a data mining technique as one of the objectives for the course. THE VALUE OF COMPETITIONS IN THE CLASSROOM There exists a bit of literature on the use of competition to promote learning in the classroom. J. R. Anderson (2006), for example, looks at the use of competitive and cooperative approaches to motivate students. He concludes that a balance approach is best. Bandura (1977) created a theory of social learning: we learn through observing others. But the state of online competitions is just beginning and there does not exist a large body of knowledge. But the stage is set. A new website, Kaggle.com, has been set up to help instructors create online competitions and may provide a potentially rich opportunity for further study. The site describes several other crowdsourcing competitions. This site was introduced to one of this paper's authors at the 2012 Joint Statistical Meetings. For more on this subject, see Elkan (2012), Gonzalez-Brenes (2012) and Sonas (2012). DATA MINING Data mining "is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner" (Hand, Mannilla and Smyth, 2001, p. 1). Data mining is a convergence of statistics, computer science and database principles. A good description of the data mining process can be found at SAS.com (n.d.) with their SEMMA methodology: sample, explore, modify, model, assess. That is, one first begins by sampling from the dataset and partitioning it into a training, validation and test set. Exploratory data analysis is then performed using data visualization techniques and other simple techniques. The data may then be transformed through normalization or linear transformations. Modeling the data is typically accomplished through computer intensive methods. Finally, the models are assessed by their ability to predict the data in the test set. This data mining process can be contrasted with the traditional hypothesis testing methods of statistics where the test is stated a priori. JMP JMP is an interactive, visual statistics package that incorporates many data mining techniques, (including neural nets, classification and regression trees), as well as more traditional techniques such as regression, ANOVA and time series methods. It also has modules for design of experiments (DOE), quality control (control charts, Pareto plots, cause and effect diagrams, Taguchi analysis and capability indexes) and survival analysis. JMP is a subsidiary of SAS and has the support and training SAS is known for, including live and on demand Webcasts, seminars, conferences, textbooks, and more. Students can download a free 30-day copy or pay $30 or $50 for six or twelve month access. A more limited version can be obtained for around $10 and bound with a text. An instructor considering using JMP in the classroom should start at JMP.com. A list of textbooks using JMP can be found at that site. …
[1]
James Bennett,et al.
The Netflix Prize
,
2007
.
[2]
Gwilym M. Jenkins,et al.
Time series analysis, forecasting and control
,
1972
.
[3]
Jonathan R Anderson,et al.
On Cooperative And Competitive Learning In The Management Classroom
,
2006
.
[4]
Gwilym M. Jenkins,et al.
Time series analysis, forecasting and control
,
1971
.
[5]
Heikki Mannila,et al.
Principles of Data Mining
,
2001,
Undergraduate Topics in Computer Science.
[6]
A. Bandura.
Social learning theory
,
1977
.
[7]
George E. P. Box,et al.
Time Series Analysis: Forecasting and Control
,
1977
.