Computer assisted grammar construction

This paper presents a system for computer assisted grammar construction (CAGC). The CAGC system is designed to generate broad-coverage grammars for large natural language corpora by utilizing both an extended inside-outside algorithm and an automatic phrase bracketing (AUTO) system, which is designed to provide the extended algorithm with constituent information during learning. This paper demonstrates the capability of the CAGC system to deal with realistic natural language problems and the usefulness of the AUTO system in the inside-outside based grammar re-estimation. Performance results including an analysis of degree of coverage and bracketing precision are presented for a grammar constructed for the Wall Street Journal (WSJ) corpus.