Probabilistic Grammar in Natural Language Processing (revised)

Probabilistic grammar assigns a probability to a sentence or a string of words, while attempting to capture more sophisticated syntactic information than the context-free grammar (CFG). A probabilistic context-free grammar (PCFG) is a context-free grammar in which every rule is annotated with the probability of choosing that rule. Each PCFG rule is treated as if it were conditionally independent; thus the probability of a sentence is computed by multiplying the probabilities of each rule in the parse of the sentence. The CYK algorithm is a bottom-up dynamic programming parsing algorithm. It can be augmented to compute the probability of a parse while it is parsing a sentence. PCFG probabilities can be learned by counting in a parsed corpus (tree-bank), or by parsing a corpus. The Inside-Outside algorithm is a way of dealing with the fact that the sentence being parsed is ambiguous. Probabilistic lexicalized context-free grammar augments PCFG with a lexical head for each rule. The probability of a rule can be conditioned on the lexical head or nearby head.