A Practical Probabilistic Benchmark for AI Weather Models