Bayesian Analysis in Natural Language Processing

This approach enables an elegant and unified way to incorporate prior knowledge and manage uncertainty over parameters. It can also be used to provide capacity control for complex models as an alternative to smoothing. There have been many successful applications of Bayesian techniques in natural language processing (NLP). Some examples include: word segmentation (Goldwater et al. 2009), syntax (Johnson et al. 2007), morphology (Snyder & Barzilay 2008), coreference resolution (Haghighi & Klein 2007), and machine translation (Blunsom et al. 2009). Cohen’s book provides an accessible yet in-depth introduction to Bayesian techniques. It is aimed at a researcher or student who is already familiar with statistical modeling in natural language (i.e., at the level of introductory books such as Manning & Schütze [1999], Jurafsky & Martin [2009]). The stated goal of the book is to “cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area.” I believe Cohen successfully achieves this goal, striking a nice balance between breadth and depth of material. Chapter 1 is a brief review of probability and statistics. It covers prerequisite concepts such as independence, conditional independence, and exchangeability of random variables. The differences between Bayesian and frequentist philosophies are discussed, albeit briefly. In general, the book maintains a pragmatic approach, focusing more on the mathematics and less on the philosophy.