Learning Grammar Specifications from IGT: A Case Study of Chintang

We present a case study of the methodology of using information extracted from interlinear glossed text (IGT) to create of actual working HPSG grammar fragments using the Grammar Matrix focusing on one language: Chintang. Though the results are barely measurable in terms of coverage over running text, they nonetheless provide a proof of concept. Our experience report reflects on the ways in which this task is non-trivial and on mismatches between the assumptions of the methodology and the realities of IGT as produced in a large-scale field project.

[1]  Noah A. Smith,et al.  Annealing Structural Bias in Multilingual Weighted Grammar Induction , 2006, ACL.

[2]  Yi Zhang,et al.  Construction of a German HPSG grammar from a detailed treebank , 2009 .

[3]  Andrej Malchukov,et al.  Flexible valency in Chintang , 2015 .

[4]  Martha Palmer,et al.  Extracting Tree Adjoining Grammars from Bracketed Corpora , 2009 .

[5]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[6]  Netra,et al.  talk of Kazi's trip , 2007 .

[7]  Andy Way,et al.  Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations , 2004, ACL.

[8]  Antske Fokkens,et al.  Grammar Customization , 2010 .

[9]  Jun'ichi Tsujii,et al.  Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank , 2004, IJCNLP.

[10]  Fei Xia,et al.  Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text , 2013, ACL.

[11]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[12]  Berthold Crysmann,et al.  Some Fine Points of Hybrid Natural Language Parsing , 2008, LREC.

[13]  Netra,et al.  khadak's daily life , 2007 .

[14]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[15]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[16]  David Allen Wax,et al.  Automated Grammar Engineering for Verbal Morphology , 2014 .

[17]  Emily M. Bender,et al.  The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-linguistically Consistent Broad-Coverage Precision Grammars , 2002, COLING 2002.

[18]  Emily M. Bender,et al.  Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties , 2013, LaTeCH@ACL.

[19]  Yorick Wilks,et al.  Compacting the Penn Treebank Grammar , 1998, ACL.

[20]  Balthasar Bickel,et al.  How to measure frequency? Different ways of counting ergatives in Chintang (Tibeto-Burman, Nepal) and their implications , 2012 .

[21]  Balthasar Bickel,et al.  The syntax of three-argument verbs in Chintang and Belhare (Southeastern Kiranti) , 2010 .

[22]  Netra P. Paudyal,et al.  Free Prefix Ordering in Chintang , 2007 .

[23]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[24]  Surya,et al.  tale of a poor guy , 2007 .

[25]  Timothy Baldwin,et al.  From Database to Treebank: On Enhancing Hypertext Grammars with Grammar Engineering and Treebank Search , 2012 .

[26]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[27]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[28]  Fei Xia,et al.  Automatically Identifying Computationally Relevant Typological Features , 2008, IJCNLP.

[29]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[30]  Emily M. Bender,et al.  Deriving a Lexicon for a Precision Grammar from Language Documentation Resources: A Case Study of Chintang , 2012, COLING.

[31]  Fei Xia,et al.  Multilingual Structural Projection across Interlinear Text , 2007, HLT-NAACL.

[32]  K. Vijay-Shanker,et al.  Automated Extraction of TAGs from the Penn Treebank , 2000, IWPT.

[33]  Scott Drellishak,et al.  Widespread but not universal: improving the typological coverage of the grammar matrix , 2009 .

[34]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[35]  Dan Klein,et al.  Prototype-Driven Grammar Induction , 2006, ACL.