Risk-Averse Trust Region Optimization for Reward-Volatility Reduction