Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties

We propose to bring together two kinds of linguistic resources—interlinear glossed text (IGT) and a language-independent precision grammar resource—to automatically create precision grammars in the context of language documentation. This paper takes the first steps in that direction by extracting major-constituent word order and case system properties from IGT for a diverse sample of languages.

[1]  Lars Hellan,et al.  Inducing grammar from IGT , 2011 .

[2]  Fei Xia,et al.  Applying NLP Technologies to the Collection and Enrichment of Language Data on the Web to Aid Linguistic Research , 2009, LaTeCH - SHELT&R@EACL.

[3]  Carmela Toews The Expression of Tense and Aspect inShona , 2009 .

[4]  Timothy Baldwin,et al.  From Database to Treebank: On Enhancing Hypertext Grammars with Grammar Engineering and Treebank Search , 2012 .

[5]  William D. Lewis ODIN: A Model for Adapting and Enriching Legacy Infrastructure , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[6]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[7]  Scott Drellishak,et al.  Widespread but not universal: improving the typological coverage of the grammar matrix , 2009 .

[8]  Fei Xia,et al.  Repurposing Theoretical Linguistic Data for Tool Development and Search , 2008, IJCNLP.

[9]  Scott Farrar,et al.  A linguistic ontology for the semantic web , 2003 .

[10]  Michael Meeuwis,et al.  Order of subject, object, and verb , 2013 .

[11]  Martin Haspelmath,et al.  The World Atlas of Language Structures Online , 2013 .

[12]  Dorothee Beermann,et al.  Linguistic data and knowledge sharing Open Access and linguistic methodology , 2009 .

[13]  Fei Xia,et al.  Improving Dependency Parsing with Interlinear Glossed Text and Syntactic Projection , 2012, COLING.

[14]  Steven Bird,et al.  The Human Language Project: Building a Universal Corpus of the World's Languages , 2010, ACL.

[15]  Emily M. Bender,et al.  The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-linguistically Consistent Broad-Coverage Precision Grammars , 2002, COLING 2002.

[16]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[17]  Antske Fokkens,et al.  Grammar Customization , 2010 .

[18]  FlickingerDan On building a more efficient grammar by exploiting types , 2000 .

[19]  Fei Xia,et al.  Automatically Identifying Computationally Relevant Typological Features , 2008, IJCNLP.

[20]  Emily M. Bender Combining Research and Pedagogy in the Development of a Crosslinguistic Grammar Resource , 2007 .

[21]  Lucille J. Watahomigie,et al.  Endangered languages. , 1991, Science.

[22]  Emily M. Bender,et al.  From IGT to precision grammar: French verbal morphology , 2012 .

[23]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[24]  Robert Forkel,et al.  The World Atlas of Language Structures Online , 2009 .

[25]  Martin Haspelmath,et al.  Alignment of case marking of full noun phrases , 2013 .

[26]  Fei Xia,et al.  Comparing Language Similarity across Genetic and Typologically-Based Groupings , 2010, COLING.

[27]  Hal Daumé,et al.  A Bayesian Model for Discovering Typological Implications , 2007, ACL.

[28]  Fei Xia,et al.  Multilingual Structural Projection across Interlinear Text , 2007, HLT-NAACL.