Looking beyond general metrics for model comparison – lessons from an international model intercomparison study