The Neural Testbed: Evaluating Predictive Distributions