论文信息 - Solving nonogram puzzles by reinforcement learning

Solving nonogram puzzles by reinforcement learning

Solving nonogram puzzles by reinforcement learning Frederic Dandurand (frederic.dandurand@gmail.com) Department of Psychology, Universite de Montreal, 90 ave. Vincent-d'Indy Montreal, QC H2V 2S9 Canada Denis Cousineau (denis.cousineau@uottawa.ca) Ecole de psychologie, Pavillon Vanier, Universite d'Ottawa 136 Jean Jacques Lussier, Ottawa, Ontario, K1N 6N5, Canada Thomas R. Shultz (thomas.shultz@mcgill.ca) Department of Psychology and School of Computer Science, McGill University, 1205 Penfield Avenue Montreal, QC H3A 1B1 Canada empty cell. At the beginning, the state of all cells is unknown (often portrayed visually by a grey color), and the goal is to determine if each cell is empty (white) or filled (black), while satisfying all of the numerical constraints. Abstract We study solvers of nonogram puzzles, which are good examples of constraint-satisfaction problems. Given an optimal solving module for solving a given line, we compare performance of three algorithmic solvers used to select the order in which to solve lines with a reinforcement-learning- based solver. The reinforcement-learning (RL) solver uses a measure of reduction of distance to goal as a reward. We compare two methods for storing qualities (Q values) of state- action pairs, a lookup table and a connectionist function approximator. We find that RL solvers learn near-optimal solutions that also outperform a heuristic solver based on the explicit, general rules often given to nonogram players. Only RL solvers that use a connectionist function approximator generalize their knowledge to generate good solutions on about half of unseen problems; RL solvers based on lookup tables generalize to none of these untrained problems. Figure 1 - Example of a 5x5 nonogram puzzle. In the initial state presented here, all cells are grey to indicate that the problem solver does not know yet if they should be filled (black) or empty (white). Keywords: Nonograms; problem solving; reinforcement learning; distance-based reward; SDCC. Nonogram puzzles Invented in Japan in the 1980s, nonograms (also called Hanjie, Paint by Numbers, or Griddlers) are logic puzzles in which problem solvers need to determine whether each cell of a rectangular array is empty or filled, given some constraints. Nonograms are interesting problems to study because they are good examples of constraint satisfaction problems (Russell & Norvig, 2003), which are ubiquitous in real life (Shultz, 2001). Furthermore, despite their popularity among puzzle players, little work on nonograms exists in cognitive science, either in the form of empirical studies or modeling work. But nonograms have attracted attention in other areas. For instance, as we will see in the literature review section, solving nonograms has been studied mathematically, and a number of machine solvers exist. Finally, many rules and strategies for human players are described in web sites. In nonograms, constraints take the form of series of numbers at the head of each line (row or column) indicating the size of blocks of contiguous filled cells found on that line. For example, in Figure 1, the first row contains 2 blocks of 2 filled cells, whereas row 5 contains no block of filled cells. Blocks have to be separated by at least one Strategies for solving nonograms To solve nonograms, two important activities are necessary. First, the problem solver needs to decide which line (row or column) to solve next, and then to actually solve that line. Problem solvers typically need to iterate through the lines, progressively gathering more and more information about whether cells are empty or filled, until the actual state of every cell is known. Just as in crossword puzzles where the found words provide letters as clues or constraints for the orthogonally intersecting words, partially solving the cells on a nonogram line provides additional constraints for the intersecting lines. A survey of popular web sites giving advice and tips on how to solve nonogram puzzles was performed, focusing on categorizing advice on selection of a line to solve, or on how to solve a given line. The majority of the advice relates to solving lines. For instance, an exhaustive set of rules can be found on Wikipedia (January 10, 2012 version). In contrast, there is comparatively little advice on how to appropriately select the next line to solve, and much of this

Denis Cousineau | Thomas R. Shultz | Frédéric Dandurand

[1] Rion Snow,et al. A combinatorial problem associated with nonograms , 2006 .

[2] A. Inkeles,et al. International Encyclopedia of the Social Sciences. , 1968 .

[3] Kees Joost Batenburg,et al. A Discrete Tomography Approach to Japanese Puzzles , 2005 .

[4] T. Shultz. Computational Developmental Psychology , 2003 .

[5] Jinn-Tsong Tsai,et al. Learning Intelligent Genetic Algorithms Using Japanese Nonograms , 2012, IEEE Transactions on Education.

[6] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[7] Thomas R. Shultz,et al. Constraint-Satisfaction Models , 2001 .

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[10] Francisco Azevedo,et al. Colored Nonograms: An Integer Linear Programming Approach , 2009, EPIA.

[11] A. Reber. Implicit learning and tacit knowledge , 1993 .

[12] Doina Precup,et al. Combining TD-learning with Cascade-correlation Networks , 2003, ICML.

[13] Shi-Jim Yen,et al. Optimization of Nonogram's Solver by Using an Efficient Algorithm , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[14] Kees Joost Batenburg,et al. Solving Nonograms by combining relaxations , 2009, Pattern Recognit..

[15] Ron Sun,et al. Cognition and Multi-Agent Interaction: The CLARION Cognitive Architecture: Extending Cognitive Modeling to Social Simulation , 2005 .

[16] Thomas R. Shultz,et al. Including cognitive biases and distance-based rewards in a connectionist model of complex problem solving , 2012, Neural Networks.

[17] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[18] Ling-Hwei Chen,et al. An efficient algorithm for solving nonograms , 2009, Applied Intelligence.