Dependency parsing with finite state transducers and compression rules

Abstract This article proposes a syntactic parsing strategy based on a dependency grammar containing formal rules and a compression technique that reduces the complexity of those rules. Compression parsing is mainly driven by the ‘single-head’ constraint of Dependency Grammar, and can be seen as an alternative method to the well-known constructive strategy. The compression algorithm simplifies the input sentence by progressively removing from it the dependent tokens as soon as binary syntactic dependencies are recognized. This strategy is thus similar to that used in deterministic dependency parsing. A compression parser was implemented and released under General Public License, as well as a cross-lingual grammar with Universal Dependencies, containing only broad-coverage rules applied to Romance languages. The system is an almost delexicalized parser which does not need training data to analyze Romance languages. The rule-based cross-lingual parser was submitted to CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. The performance of our system was compared to the other supervised systems participating in the competition, paying special attention to the parsing of different treebanks of the same language. We also trained a supervised delexicalized parser for Romance languages in order to compare it to our rule-based system. The results show that the performance of our cross-lingual method does not change across related languages and across different treebanks, while most supervised methods turn out to be very dependent on the text domain used to train the system.

[1]  Miguel A. Alonso,et al.  On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages , 2015, J. Assoc. Inf. Sci. Technol..

[2]  Pablo Gamallo Otero,et al.  A grammatical formalism based on patterns of part of speech tags , 2011 .

[3]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[4]  Barbara Plank,et al.  Parsing Universal Dependencies without training , 2017, EACL.

[5]  Jörg Tiedemann Cross-lingual dependency parsing for closely related languages - Helsinki's submission to VarDial 2017 , 2017, VarDial.

[6]  Harri Arnola On Parsing Binary Dependency Structures Deterministically in Linear Time , 1998, Workshop On Processing Of Dependency-Based Grammars.

[7]  Miguel A. Alonso,et al.  Extraction of complex index terms in non-English IR: A shallow parsing based approach , 2008, Inf. Process. Manag..

[8]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[9]  Michael A. Covington,et al.  A Fundamental Algorithm for Dependency Parsing , 2004 .

[10]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[11]  Lluís Padró,et al.  ParTes. Test Suite for Parsing Evaluation , 2014, Proces. del Leng. Natural.

[12]  Rasim M. Alguliyev,et al.  Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems , 2015, Inf. Process. Manag..

[13]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[14]  Noah A. Smith,et al.  Parsing with Soft and Hard Constraints on Dependency Length , 2005, IWPT.

[15]  Guodong Zhou,et al.  Tree kernel-based semantic role labeling with enriched parse tree structure , 2011, Inf. Process. Manag..

[16]  Jörg Tiedemann,et al.  Synthetic Treebanking for Cross-Lingual Dependency Parsing , 2016, J. Artif. Intell. Res..

[17]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[18]  Satoshi Sekine Japanese Dependency Analysis using a Deterministic Finite State Transducer , 2000, COLING.

[19]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[20]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[21]  Alessandro Moschitti,et al.  Multi-lingual opinion mining on YouTube , 2016, Inf. Process. Manag..

[22]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[23]  Kimmo Koskenniemi,et al.  Compiling and Using Finite-State Syntactic Rules , 1992, COLING.

[24]  Carlos Gómez-Rodríguez,et al.  Dependency Parsing with Undirected Graphs , 2012, EACL 2012.

[25]  Iñaki Alegria,et al.  From language identification to language distance , 2017 .

[26]  Ben Taskar,et al.  Dependency Grammar Induction via Bitext Projection Constraints , 2009, ACL/IJCNLP.

[27]  Yves Schabes,et al.  Finite-State Approximation of Phrase-Structure Grammars , 1997 .

[28]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[29]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[30]  Eric Laporte Context-free parsing with finite-state transducers , 1996 .

[31]  Jean-Pierre Chanod,et al.  Robustness beyond shallowness: incremental deep parsing , 2002, Natural Language Engineering.

[32]  Aravind K. Joshi A Parser from Antiquity: An Early Application of Finite State Transducers to Natural Language Parsin , 1996 .

[33]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[34]  Rudolf Rosa,et al.  Slavic Forest, Norwegian Wood , 2017, VarDial.

[35]  Victor M. Darriba,et al.  Undirected Dependency Parsing , 2015, Comput. Intell..

[36]  Preslav Nakov,et al.  Findings of the VarDial Evaluation Campaign 2017 , 2017, VarDial.

[37]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[38]  Eduard H. Hovy,et al.  A Fast, Accurate, Non-Projective, Semantically-Enriched Parser , 2011, EMNLP.

[39]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[40]  Anssi Yli-Jyrä,et al.  Linguistic grammars with very low complexity , 2005 .

[41]  Yannick Versley,et al.  Experiments with Easy-first nonprojective constituent parsing , 2014 .

[42]  Pablo Gamallo,et al.  Using Morphosyntactic Post-processing to Improve POS-tagging Accuracy , 2009 .

[43]  Stephen Clark,et al.  Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model , 2009, IWPT.

[44]  Zdenek Zabokrtský,et al.  Combining Czech Dependency Parsers , 2006, TSD.

[45]  Pablo Gamallo,et al.  Dependency-Based Open Information Extraction , 2012 .

[46]  Joakim Nivre,et al.  Incrementality in Deterministic Dependency Parsing , 2004 .

[47]  Pablo Gamallo Dependency Parsing with Compression Rules , 2015, IWPT.

[48]  Mark Steedman,et al.  Unbounded Dependency Recovery for Parser Evaluation , 2009, EMNLP.

[49]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[50]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[51]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[52]  Fabio Ciravegna,et al.  Full parsing approximation for information extraction via finite-state cascades , 2002, Nat. Lang. Eng..

[53]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[54]  Jakob Uszkoreit,et al.  Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure , 2012, NAACL.

[55]  Yoav Goldberg,et al.  An Efficient Algorithm for Easy-First Non-Directional Dependency Parsing , 2010, NAACL.

[56]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[57]  Barbara Plank,et al.  Multilingual Projection for Parsing Truly Low-Resource Languages , 2016, TACL.

[58]  Miguel A. Alonso,et al.  One model, two languages: training bilingual parsers with harmonized treebanks , 2015, ACL.

[59]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[60]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[61]  Yuji Matsumoto,et al.  Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data , 2017 .

[62]  Jan Hajic,et al.  UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing , 2016, LREC.

[63]  Mark-Jan Nederhof,et al.  Practical Experiments with Regular Approximation of Context-Free Languages , 1999, CL.

[64]  Edmund Grimley Evans Approximating context-free grammars with a finite-state calculus , 1997 .

[65]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[66]  Pablo Gamallo,et al.  Dependency-Based Text Compression for Semantic Relation Extraction , 2011 .

[67]  José Ramom Pichel Campos,et al.  Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary , 2008, CICLing.

[68]  Pablo Gamallo,et al.  Yet Another Suite of Multilingual NLP Tools , 2015, SLATE.

[69]  Pablo Gamallo,et al.  A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies , 2017, CoNLL Shared Task.

[70]  Yves Schabes,et al.  Parsing with Finite-State Transducers , 1997 .

[71]  Emmanuel Roche,et al.  Finite state transducers: parsing free and frozen sentences , 1999 .

[72]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[73]  Pablo Gamallo,et al.  LinguaKit: uma ferramenta multilingue para a análise linguística e a extração de informação , 2017, Linguamática.

[74]  Mark Johnson,et al.  Finite-state Approximation of Constraint-based Grammars using Left-corner Grammar Transforms , 1998, ACL.

[75]  Gary Geunbae Lee,et al.  Dependency structure language model for topic detection and tracking , 2007, Inf. Process. Manag..

[76]  Sofie Johansson Kokkinakis,et al.  A Cascaded Finite-State Parser for Syntactic Analysis of Swedish , 1999, EACL.

[77]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[78]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[79]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[80]  Gregory Grefenstette Light parsing as finite state filtering , 1999 .

[81]  Guodong Zhou,et al.  Exploring syntactic structured features over parse trees for relation extraction using kernel methods , 2008, Inf. Process. Manag..

[82]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[83]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[84]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.