Learning Tree Patterns for Syntactic Parsing

This paper presents a method for parsing Hungarian texts using a machine learning approach. The method collects the initial grammar for a learner from an annotated corpus with the help of tree shapes. The PGS algorithm, an improved version of the RGLearn algorithm, was developed and applied to learning tree patterns with various phrase types described by regular expressions. The method also calculates the probability values of the learned tree patterns. The syntactic parser of learned grammar using the Viterbi algorithm performs a quick search for finding the most probable derivation of a sentence. The results were built into an information extraction pipeline.

[1]  Erik F. Tjong Kim Sang,et al.  Noun Phrase Recognition by System Combination , 2000, ANLP.

[2]  Atanas Kiryakov,et al.  CLaRK - an XML-based System for Corpora Development 1 , 2001 .

[3]  András Hócza,et al.  Noun Phrase Recognition with Tree Patterns , 2004, Acta Cybern..

[4]  Tamás Váradi,et al.  The Hungarian National Corpus , 2002, LREC.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[7]  Balázs Kis,et al.  A Unification-based Approach to Morpho-syntactic Parsing of Agglutinative and Other (Highly) Inflectional Languages , 1999, ACL.

[8]  János Csirik,et al.  Manually annotated Hungarian corpus , 2003 .

[9]  Shlomo Argamon,et al.  A Memory-Based Approach to Learning Shallow Natural Language Patterns , 1998, ACL.

[10]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[11]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[12]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[13]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[14]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[15]  Tamás Váradi,et al.  Shallow parsing of Hungarian business news , 2003 .