Fast Em Learning of a Family of Pcfgs Kameya, Yoshitaka (ns Solutions)

The purpose of this report is to optimize the Inside-Outside algorithm and realize fast EM learning of various extensions of PCFGs. The point of our optimization is the introduction of a new data structure called support graphs hierarchically representing a set of parse trees and a new EM learning algorithm called the graphical EM algorithm that runs on them. Learning experiments with PCFGs using Japanese corpora of moderate size with contrasting characters indicate that in terms of updating time per iteration , the graphical EM algorithm can learn parameters of PCFGs orders of magnitude faster than the Inside-Outside algorithm. We also experimentally show that the graph-ical EM algorithm requires at most about twice as much time as a pure PCFG to learn parameters of extended PCFGs (a Pseudo PCSG and and a lexicalized PCFG).