Offline Policy Optimization in RL with Variance Regularizaton