Factoring Synchronous Grammars by Sorting

Synchronous Context-Free Grammars (SCFGs) have been successfully exploited as translation models in machine translation applications. When parsing with an SCFG, computational complexity grows exponentially with the length of the rules, in the worst case. In this paper we examine the problem of factorizing each rule of an input SCFG to a generatively equivalent set of rules, each having the smallest possible length. Our algorithm works in time O(n log n), for each rule of length n. This improves upon previous results and solves an open problem about recognizing permutations that can be factored.