Reliability of simulation-based assessment for practicing physicians: performance is context-specific