Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology