Data Management, Exploratory Data Analysis, and Regression Analysis with 1969–2000 Major League Baseball Attendance

The 1969–2000 Major League Baseball Attendance dataset contains Runs Scored, Runs Allowed, Wins, Losses, Number of Games Behind the Division Leader, and Home Game Attendance of each major league franchise for the 1969 through 2000 seasons. Also included for each franchise are its location, league affiliation (National or American), and division affiliation (East, Central, or West). These data have been used in a project-based modeling course to instruct students in basic data management, the use of exploratory data analysis to "clean" data, and construction of regression models. The dataset, which is both cross-sectional and time-series, is of a manageable size and easily understood. Furthermore, it provides a useful, interesting, and realistic classroom example for discussing many important statistical concepts.