Converting Dependency Structure Into Persian Phrase Structure

Treebank is one of the important and useful resources in natural language processing represented in two different annotated schemas: phrase and dependency structures. There are many works that convert a phrase structure into a dependency structure and vice versa. Most of them are based that exploit the handcrafted head percolation table and argument table in predefined deterministic ways. In this article, we propose a method to convert a dependency structure into a phrase structure by enriching a trainable model of former hybrid strategy approach. By adding a classifier to the algorithm and using postprocessing modification, the quality of conversion is increased. We evaluate our method in two different languages, English and Persian, and then analyze the errors. The results of our experiments show a 46.01% reduction of error rate in English and 76.50% for Persian compared to our baseline. We build a new phrase structure treebank by converting 10,000 sentences of Persian dependency treebank into corresponding phrase structures and correcting them manually.

[1]  Kun Yu,et al.  Semi-automatically Developing Chinese HPSG Grammar from the Penn Chinese Treebank for Deep Parsing , 2010, COLING.

[2]  Masood Ghayoomi,et al.  Word Clustering for Persian Statistical Parsing , 2012, JapTAL.

[3]  Fei Xia,et al.  Challenges in Converting between Treebanks : a Case Study from the HUTB , 2012 .

[4]  Fei Xia,et al.  A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu , 2009, Linguistic Annotation Workshop.

[5]  Jun'ichi Tsujii,et al.  Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank , 2004, IJCNLP.

[6]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[7]  Pawan Goyal,et al.  Converting Phrase Structures to Dependency Structures in Sanskrit , 2014, COLING.

[8]  Peng Jin,et al.  Multi-view Chinese Treebanking , 2014, COLING.

[9]  Fei Xia,et al.  Creating a Tree Adjoining Grammar from a Multilayer Treebank , 2012, TAG.

[10]  Mohammad Sadegh Rasooli,et al.  A Syntactic Valency Lexicon for Persian Verbs : The First Steps towards Persian Dependency Treebank , 2012 .

[11]  Owen Rambow,et al.  Towards a Multi-Representational Treebank , 2008 .

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  William Schuler,et al.  On Relations of Constituency and Dependency Grammars , 2004 .

[14]  Ruken Cakici,et al.  Automatic Induction of a CCG Grammar for Turkish , 2005, ACL.

[15]  Fei Xia,et al.  Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure , 2009 .

[16]  Cristina Bosco,et al.  Converting a dependency treebank to a categorial grammar treebank for Italian , 2009 .

[17]  Jun'ichi Tsujii,et al.  Translating the XTAG English grammar to HPSG , 1998, TAG+.

[18]  Fei Xia,et al.  Linguistic Phenomena, Analyses, and Representations: Understanding Conversion between Treebanks , 2011, IJCNLP.

[19]  Julia Hockenmaier,et al.  Statistical Parsing for CCG with Simple Generative Models , 2001, ACL.

[20]  Andy Way,et al.  Automatic annotation of the Penn-treebank with LFG f-structureinformation , 2002 .

[21]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[22]  Nianwen Xue,et al.  Converting SynTagRus Dependency Treebank into Penn Treebank Style , 2016, LAW@ACL.

[23]  Yi Zhang,et al.  Construction of a German HPSG grammar from a detailed treebank , 2009 .

[24]  Josef van Genabith,et al.  Automatic Treebank-Based Acquisition of Arabic LFG Dependency Structures , 2009, SEMITIC@EACL.

[25]  Alexander M. Rush,et al.  Transforming Dependencies into Phrase Structures , 2015, NAACL.

[26]  Mark Steedman,et al.  Transforming Dependency Structures to Logical Forms for Semantic Parsing , 2016, TACL.

[27]  Jonas Kuhn,et al.  Converting an HPSG-based Treebank into its Parallel Dependency-based Treebank , 2014, LREC.

[28]  Mohammad Sadegh Rasooli,et al.  Development of a Persian Syntactic Dependency Treebank , 2013, NAACL 2013.

[29]  Mark Steedman,et al.  Hindi CCGbank: A CCG treebank from the Hindi dependency treebank , 2017, Language Resources and Evaluation.

[30]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[31]  P. Osenova,et al.  Dependency conversion and parsing of the BulTreeBank , .

[32]  Yi Zhang,et al.  Conversion of a Russian dependency treebank into HPSG derivations , 2010 .

[33]  Yusuke Miyao,et al.  Grammar conversion from LTAG to HPSG , 2002 .

[34]  Masood Ghayoomi Bootstrapping the Development of an HPSG-based Treebank for Persian , 2012 .

[35]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[36]  Zhiguo Wang,et al.  Language Independent Dependency to Constituent Tree Conversion , 2016, COLING.

[37]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[38]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[39]  Michael A. Covington An Empirically Motivated Reinterpretation of Dependency Grammar , 1994, ArXiv.