Products of weighted logic programs

Weighted logic programming, a generalization of bottom-up logic programming, is a well-suited framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm's output space, such as a path through a graph or a grammatical derivation, and are given a real-valued score (often interpreted as a probability) that depends on the real weights of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the product transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a “product of experts.” Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying product to weighted logic programs corresponding to simpler weighted logic programs. In addition, we show how the computation of Kullback–Leibler divergence, an information-theoretic measure, can be interpreted using product.

[1]  Rebecca Hwa,et al.  Sample Selection for Statistical Parsing , 2004, CL.

[2]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[3]  Giorgio Satta,et al.  New developments in parsing technology , 2004 .

[4]  Joshua Goodman,et al.  Semiring Parsing , 1999, CL.

[5]  Kenji Yamada,et al.  Syntax-based language models for statistical machine translation , 2003, ACL 2003.

[6]  Sergio Greco,et al.  Greedy Algorithms in Datalog , 2001, Theory Pract. Log. Program..

[7]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[8]  David A. McAllester On the complexity analysis of static analyses , 1999, JACM.

[9]  George Cybenko,et al.  Efficient computation of the hidden Markov model entropy for a given observation sequence , 2005, IEEE Transactions on Information Theory.

[10]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[11]  Trevor Cohn,et al.  Logarithmic Opinion Pools for Conditional Random Fields , 2005, ACL.

[12]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[13]  Matt Post,et al.  Syntax-based language models for statistical machine translation , 2010 .

[14]  Mehryar Mohri,et al.  Efficient Computation of the Relative Entropy of Probabilistic Automata , 2006, LATIN.

[15]  Noah A. Smith,et al.  Dynamic Programming Algorithms as Products of Weighted Logic Programs , 2008, ICLP.

[16]  Noah A. Smith,et al.  Joint Morphological and Syntactic Disambiguation , 2007, EMNLP.

[17]  Haim Gaifman,et al.  Dependency Systems and Phrase-Structure Systems , 1965, Inf. Control..

[18]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[19]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[20]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[21]  Alberto Pettorossi,et al.  Synthesis and Transformation of Logic Programs Using Unfold/Fold Proofs , 1999, J. Log. Program..

[22]  Stuart M. Shieber,et al.  Principles and Implementation of Deductive Parsing , 1994, J. Log. Program..

[23]  I. Dan Melamed,et al.  Multitext Grammars and Synchronous Parsers , 2003, NAACL.

[24]  J. B. Program transformations for optimization of parsing algorithms and other weighted logic programs , 2007 .

[25]  Daniel Gildea,et al.  Inducing Word Alignments with Bilexical Synchronous Trees , 2006, ACL.

[26]  Noah A. Smith,et al.  Dyna: a declarative language for implementing dynamic programs , 2004, ACL 2004.

[27]  Noah A. Smith,et al.  Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[28]  John Cocke,et al.  Programming languages and their compilers , 1969 .

[29]  Daniel Gildea,et al.  Stochastic Lexicalized Inversion Transduction Grammar for Alignment , 2005, ACL.

[30]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[31]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[32]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[33]  Jason Eisner Bilexical Grammars and a Cubic-time Probabilistic Parser , 1997, IWPT.

[34]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[35]  Dan Klein,et al.  Agreement-Based Learning , 2007, NIPS.

[36]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[37]  J. O’Sullivan Alternating Minimization Algorithms: From Blahut-Arimoto to Expectation-Maximization , 1998 .

[38]  Michael Riley,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1996, ArXiv.

[39]  Jason Eisner,et al.  Bilexical Grammars and their Cubic-Time Parsing Algorithms , 2000 .

[40]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[41]  Klaas Sikkel,et al.  Parsing Schemata , 1997, Texts in Theoretical Computer Science An EATCS Series.

[42]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[43]  Adam Lopez,et al.  Translation as Weighted Deduction , 2009, EACL.

[44]  Robert E. Tarjan,et al.  A Unified Approach to Path Problems , 1981, JACM.

[45]  Noah A. Smith,et al.  Bilingual Parsing with Factored Estimation: Using English to Parse Korean , 2004, EMNLP.

[46]  Harald Ganzinger,et al.  Logical Algorithms , 2002, ICLP.

[47]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[48]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[49]  Louisa Sadler,et al.  Structural Non-Correspondence in Translation , 1991, EACL.

[50]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[51]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[52]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[53]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[54]  Alexander Vardy Codes, Curves, and Signals: Common Threads in Communications , 1998 .

[55]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[56]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[57]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[58]  Alberto Pettorossi,et al.  Transformation of Logic Programs: Foundations and Techniques , 1994, J. Log. Program..

[59]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .