Assessing Model Performance: Which Data to Use?