Students revisit an activity conducted earlier in the semester in the unit on comparing groups with boxplots (Gummy Bears Activity in Lesson 2, Chapter 11). Once again, they are going to design an experiment to compare the distances of gummy bears launched from two different heights. The experiment is discussed, the students form groups, and the conditions are randomly assigned to the groups of students. This time a detailed protocol is developed and used that specifies exactly how students are to launch the gummy bears and measure the results. The data gathered this time seem to have less variability than the earlier activity, which is good. The students enter the data into Fathom (Key Curriculum Press, 2006), which is used to generate graphs that are compared to the earlier results, showing less within group variability this time due to the more detailed protocol. There is a discussion of the between versus within variability, and what the graphs suggest about true differences in distances. Fathom is then used to run a two sample t test and the results show a significant difference, indicated by a small P-value. Next, students have Fathom calculate a 95% confidence interval to estimate the true difference in mean distances. In discussing this experiment, the students revisit important concepts relating to designing experiments, how they are able to draw casual conclusions from this experiment, and the role of variability between and within groups. Connections are drawn between earlier topics and the topic of inference, as well as between tests of significance and confidence intervals in the context of a concrete experiment. The metaphor of making an argument is revisited from earlier uses in the course, this time in connection with the hypothesis test procedure. Links are shown between the claim (that higher stacks of books will launch bears for farther distances), the evidence used to support the claim (the data gathered in the experiment), the quality and justification of the evidence (the experimental design, randomization, sample size), limitations in the evidence (small number of launches) and finally, an indicator of how convincing the argument is (the P-value). By discussing the idea of the