MaltParser: A Language-Independent System for Data-Driven Dependency Parsing

Parsing unrestricted text is useful for many language technology applications but requires parsing methods that are both robust and efficient. MaltParser is a language-independent system for data-driven dependency parsing that can be used to induce a parser for a new language from a treebank sample in a simple yet flexible manner. Experimental evaluation confirms that MaltParser can achieve robust, efficient and accurate parsing for a wide range of languages without language-specific enhancements and with rather limited amounts of training data.

[1]  Robert C. Berwick,et al.  The acquisition of syntactic knowledge , 1985 .

[2]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[3]  Hiroshi Maruyama,et al.  Structural Disambiguation with Constraint Propagation , 1990, ACL.

[4]  Richard Hudson,et al.  English word grammar , 1995 .

[5]  Robert F. Simmons,et al.  The Acquisition and Use of Context-Dependent Grammars for English , 1992, Comput. Linguistics.

[6]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[7]  G. Leech,et al.  Statistically-driven computer grammars of English : the IBM/LANCASTER approach , 1993 .

[8]  Raymond J. Mooney,et al.  Learning Semantic Grammars with Constructive Inductive Logic Programming , 1993, AAAI.

[9]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[10]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[11]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[12]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[13]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[14]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[15]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[16]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[17]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[20]  David Chiang,et al.  Two Statistical Parsing Models Applied to the Chinese Treebank , 2000, ACL 2000.

[21]  Eugene Charniak,et al.  Assigning Function Tags to Parsed Text , 2000, ANLP.

[22]  Martin Kay,et al.  Guides and Oracles for Linear-Time Parsing , 2000, IWPT.

[23]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[24]  Walter Daelemans,et al.  A Memory-Based Alternative for Connectionist Shift-Reduce Parsing , 2000 .

[25]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[26]  Atro Voutilainen Parsing Swedish , 2001, NODALIDA.

[27]  Cristina Bosco,et al.  Treebank Development: the TUT Approach , 2002 .

[28]  Michael Moortgat,et al.  Syntactic Analysis in the Spoken Dutch Corpus (CGN) , 2002, LREC.

[29]  P. Osenova,et al.  ‘An HPSG-based Syntactic Treebank of Bulgarian (BulTreeBank)’ , 2002 .

[30]  Walter Daelemans,et al.  MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .

[31]  Ruslan Mitkov,et al.  Shallow Language Processing Architecture for Bulgarian , 2002, COLING.

[32]  Yuji Matsumoto,et al.  Japanese Dependency Analysis using Cascaded Chunking , 2002, CoNLL.

[33]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[34]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[35]  M. Trautner,et al.  The Danish Dependency Treebank and the DTAG Treebank Tool , 2003 .

[36]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[37]  Roger Levy,et al.  Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[38]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[39]  Dilek Z. Hakkani-Tür,et al.  Building a Turkish Treebank , 2003 .

[40]  Frank Keller,et al.  Probabilistic Parsing for German Using Sister-Head Dependencies , 2003, ACL.

[41]  Roberto Basili,et al.  Building the Italian Syntactic-Semantic Treebank , 2003 .

[42]  Johan Hall from MSI-Rapporter från MSI A Probabilistic Part-of-Speech Tagger with Suffix Probabilities , 2003 .

[43]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[44]  Cristina Bosco,et al.  A GRAMMATICAL RELATION SYSTEM FOR TREEBANK ANNOTATION , 2003 .

[45]  Joakim Nivre,et al.  Incrementality in Deterministic Dependency Parsing , 2004 .

[46]  Yuji Matsumoto,et al.  Deterministic Dependency Structure Analyzer for Chinese , 2004, IJCNLP.

[47]  Michael A. Covington,et al.  A Fundamental Algorithm for Dependency Parsing , 2004 .

[48]  Giorgio Satta,et al.  Analyzing an Italian Treebank with State-of-the-Art Statistical Parsers , 2004 .

[49]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[50]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[51]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[52]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[53]  Yuji Matsumoto,et al.  Machine Learning-based Dependency Analyzer for Chinese , 2005, J. Chin. Lang. Comput..

[54]  Fei Xia,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[55]  Joakim Nivre,et al.  A data-driven parser for Bulgarian , 2005 .

[56]  Joakim Nivre,et al.  A Data-Driven Dependency Parser for Bulgarian , 2005 .

[57]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[58]  Alon Lavie,et al.  A Classifier-Based Parser with Linear Run-Time Complexity , 2005, IWPT.

[59]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[60]  Yuji Matsumoto,et al.  Chinese Deterministic Dependency Analyzer: Examining Effects of Global Features and Root Node Finder , 2005, SIGHAN@IJCNLP 2005.

[61]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[62]  Joakim Nivre,et al.  Inductive Dependency Parsing of Natural Language Text , 2005 .

[63]  Keith Hall,et al.  Corrective Modeling for Non-Projective Dependency Parsing , 2005, IWPT.

[64]  Atanas Chanev,et al.  Portability of Dependency Parsing Algorithms An Application for Italian , 2005 .

[65]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[66]  Kemal Oflazer,et al.  Statistical Dependency Parsing for Turkish , 2006, EACL.

[67]  Ivan Titov,et al.  Porting Statistical Parsers with Data-Defined Kernels , 2006, CoNLL.

[68]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[69]  Johan Hall MaltParser -- An Architecture for Inductive Labeled Dependency Parsing , 2006 .

[70]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[71]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[72]  Joakim Nivre,et al.  Graph Transformations in Data-Driven Dependency Parsing , 2006, ACL.

[73]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.