Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach