Effect of look-ahead search depth in learning position evaluation functions for Othello using -greedy exploration

This paper studies the effect of varying the depth of look-ahead for heuristic search in temporal difference (TD) learning and game playing. The acquisition position evaluation functions for the game of Othello is studied. The paper provides important insights into the strengths and weaknesses of using different search depths during learning when epsi-greedy exploration is applied. The main findings are that contrary to popular belief, for Othello, better playing strategies are found when TD learning is applied with lower look-ahead search depths