Benchmarking Bayesian neural networks and evaluation metrics for regression tasks