Joint Grammar Development by Linguists and Computer Scientists

For languages with inflectional morphology, development of a morphological parser can be a bottleneck to further development. We focus on two difficulties: first, finding people with expertise in both computer programming and the linguistics of a particular language, and second, the short lifetime of software such as parsers. We describe a methodology to split parser building into two tasks: descriptive grammar development, and formal grammar development. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine’s programming language, so that it can be readily ported to a new parsing engine, thus helping solve the software lifetime problem.

[1]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[2]  Mike Maxwell,et al.  Language Documentation: The Nahuatl Grammar , 2005, CICLing.

[3]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[4]  Erik Kamsties,et al.  Ambiguity in Requirements Specification , 2004 .

[5]  Norman J. Walsh,et al.  DocBook: The Definitive Guide , 1999 .

[6]  Lauri Karttunen,et al.  Finite State Morphology , 2003, CSLI Studies in Computational Linguistics.

[7]  M. McShane,et al.  Bootstrapping Morphological Analyzers by Combining Human Elicitation and Machine Learning , 2001, Computational Linguistics.

[8]  Miriam Butt,et al.  A grammar writer's cookbook , 1999 .

[9]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[10]  Steven Bird,et al.  Models and Tools for Collaborative Annotation , 2002, LREC.

[11]  Gary Simons,et al.  Seven Dimensions of Portability for Language Documentation and Description , 2002, ArXiv.

[12]  John Goldsmith,et al.  From Signatures to Finite State Automata , 2004 .

[13]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[14]  Emily M. Bender,et al.  Montage: Leveraging advances in grammar engineering, linguistic ontologies, and mark-up for the documentation of underdescribed languages , 2004 .

[15]  Julio Cesar Sampaio do Prado Leite,et al.  Perspectives on software requirements , 2004 .

[16]  M. Maxwell Interoperable Grammars , 2007 .

[17]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .