Improving parsing using morpho-syntactic and semantic information

In this paper we present the efforts of creating a syntactic parser with a very good performance on Romanian sentences. Instead of creating a parser from scratch, we decided to test the freely available existing ones and to subject them to tuning, feeding them with linguistic information in the form of features (transitivity/intransitivity, semantic class, subcategorization frames, etc.). The parsers are trained and tested on the Romanian treebank in the Universal Dependencies format. We present here, as on-going work, some partial results of our endeavour: after including only several features, we got encouraging results. We also discuss some other features that can be added to the parser in order to further improve its performance, with the final aim of attaining a reliable tool for syntactic analysis of sentences, as a task per se, and also for their use in various applications involving natural language processing.

[1]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[2]  Ted Briscoe,et al.  Can Subcategorisation Probabilities Help a Statistical Parser , 1998, VLC@COLING/ACL.

[3]  Giuseppe Attardi,et al.  Dependency Parsing with Second-Order Feature Maps and Annotated Semantic Information , 2010, Trends in Parsing Technology.

[4]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[7]  Eneko Agirre,et al.  Improving Dependency Parsing with Semantic Classes , 2011, ACL.

[8]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[9]  Joakim Nivre,et al.  MaltOptimizer: An Optimization Tool for MaltParser , 2012, EACL.

[10]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Joakim Nivre,et al.  A Data-Driven Dependency Parser for Romanian , 2009 .

[13]  Verginica Barbu Mititelu,et al.  The Romanian wordnet in a nutshell , 2013, Lang. Resour. Evaluation.

[14]  Daniel Zeman Can Subcategorization Help a Statistical Dependency Parser? , 2002, COLING.

[15]  Cenel-Augusto Perez,et al.  Including Social Media – A Very Dynamic Style – in the Corpora for Processing Romanian Language , 2015 .

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Eric Wehrli,et al.  FipsRomanian: Towards a Romanian Version of the Fips Syntactic Parser , 2010, LREC.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Stefan Daniel Dumitrescu,et al.  The IPR-cleared Corpus of Contemporary Written and Spoken Romanian Language , 2016, LREC.