Language Model Self-improvement by Reinforcement Learning Contemplation