A probabilistic approach to solving crossword puzzles

We attacked the problem of solving crossword puzzles by computer: given a set of clues and a crossword grid, try to maximize the number of words correctly filled in. After an analysis of a large collection of puzzles, we decided to use an open architecture in which independent programs specialize in solving specific types of clues, drawing on ideas from information retrieval, database search, and machine learning. Each expert module generates a (possibly empty) candidate list for each clue, and the lists are merged together and placed into the grid by a centralized solver. We used a probabilistic representation as a common interchange language between subsystems and to drive the search for an optimal solution. PROVERB, the complete system, averages 95.3% words correct and 98.1% letters correct in under 15 minutes per puzzle on a sample of 370 puzzles taken from the New York Times and several other puzzle sources. This corresponds to missing roughly 3 words or 4 letters on a daily 1515 puzzle, making PROVERB a better-than-average cruciverbalist (crossword solver).

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[3]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[6]  Edward H. Adelson,et al.  Belief Propagation and Revision in Networks with Loops , 1997 .

[7]  Michael C. Frank,et al.  Search Lessons Learned from Crossword Puzzles , 1990, AAAI.

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[9]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[10]  Steven Abney,et al.  Statistical Methods and Linguistics , 2002 .

[11]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[12]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[13]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[14]  R. Dechter to Constraint Satisfaction , 1991 .

[15]  Michael L. Littman,et al.  Solving Crossword Puzzles as Probabilistic Constraint Satisfaction , 1999, AAAI/IAAI.

[16]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[17]  Alan K. Mackworth Consistency in Networks of Relations , 1977, Artif. Intell..

[18]  Karl Weinmeister,et al.  PROVERB: The Probabilistic Cruciverbalist , 1999, AAAI/IAAI.

[19]  SpinkAmanda,et al.  Real life information retrieval: a study of user queries on the Web , 1998 .

[20]  Thomas Schiex,et al.  Valued Constraint Satisfaction Problems: Hard and Easy Problems , 1995, IJCAI.

[21]  Michael L. Littman,et al.  Review: Computer Language Games , 2000, Computers and Games.

[22]  Barr and Feigenbaum Edward A. Avron The Handbook of Artificial Intelligence , 1981 .

[23]  James R. Munkres,et al.  Topology; a first course , 1974 .

[24]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[25]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[26]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .