Rethinking Machine Learning Model Evaluation in Pathology