Chinese Treebanks and Grammar Extraction

Preparation of knowledge bank is a very difficult task. In this paper, we discuss the knowledge extraction from the manually examined Sinica Treebank. Categorical information, word-to-word relation, word collocations, new syntactic patterns and sentence structures are obtained. A searching system for Chinese sentence structure was developed in this study. By using pre-extracted data and SQL commands, the system replies the user’s queries efficiently. We also analyze the extracted grammars to study the tradeoffs between the granularity of the grammar rules and their coverage as well as ambiguities. It provides the information of knowing how large a treebank is sufficient for the purpose of grammar extraction. Finally, we also analyze the tradeoffs between grammar coverage and ambiguity by parsing results from the grammar rules of different granularity.

[1]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[2]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[3]  Khalil Sima'an,et al.  What are Treebank Grammars , 2006 .

[4]  Keh-Jiann Chen,et al.  A Model for Robust Chinese Parser , 1996, Int. J. Comput. Linguistics Chin. Lang. Process..

[5]  Nianwen Xue,et al.  Developing Guidelines and Ensuring Consistency for Chinese Text Annotation , 2000, LREC.

[6]  陳克健,et al.  Approaches on an Experimental Chinese Electronic Dictionary , 1988 .

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Keh-Jiann Chen,et al.  中文句結構樹資料庫的構建 (Sinica Treebank) [In Chinese] , 1999, ROCLING/IJCLCLP.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[11]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[12]  Hans Uszkoreit Categorial Unification Grammars , 1986, COLING.

[13]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[14]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[15]  Li-Ping Chang,et al.  A Practical Tagger for Chinese Corpora , 1994, ROCLING/IJCLCLP.

[16]  Chu-Ren Huang,et al.  SINICA CORPUS : Design Methodology for Balanced Corpora , 1996, PACLIC.