ON THE RELIABILITY OF DECISIONS IN DOMAIN‐REFERENCED TESTING