Fusion of knowledge-based and data-driven approaches to grammar induction

Using different sources of information for grammar induction results in grammars that vary in coverage and precision. Fusing such grammars with a strategy that exploits their strengths while minimizing their weaknesses is expected to produce grammars with superior performance. We focus on the fusion of grammars produced using a knowledge-based approach using lexicalized ontologies and a data-driven approach using semantic similarity clustering. We propose various algorithms for finding the map- ping between the (non-terminal) rules generated by each gram- mar induction algorithm, followed by rule fusion. Three fusion approaches are investigated: early, mid and late fusion. Results show that late fusion provides the best relative F-measure per- formance improvement by 20%.

[1]  Asunción Gómez-Pérez,et al.  Interchanging lexical resources on the Semantic Web , 2012, Language Resources and Evaluation.

[2]  David Milward,et al.  Ontology-Based Dialogue Systems , 2003 .

[3]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Chu-Ren Huang,et al.  Ontology and the Lexicon: Ontology and the lexicon: a multidisciplinary perspective , 2010 .

[5]  Annika Flycht-Eriksson,et al.  Design and use of ontologies in information-providing dialogue systems , 2004 .

[6]  Chu-Ren Huang,et al.  Ontology and the lexicon : a natural language processing perspective , 2010 .

[7]  Alexandros Potamianos,et al.  Web data harvesting for speech understanding grammar induction , 2013, INTERSPEECH.

[8]  Helen M. Meng,et al.  Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Adam Przepiórkowski,et al.  Dealing with Small, Noisy and Imbalanced Data , 2008, TSD.

[10]  Chin-Hui Lee,et al.  Auto-induced semantic classes , 2004, Speech Commun..

[11]  Alexandros Potamianos,et al.  Similarity computation using semantic networks created from web-harvested data , 2013, Natural Language Engineering.

[12]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[13]  Srinivas Bangalore,et al.  Automatic Acquisition of Phrase Grammars for Stochastic Language Modeling , 1998, VLC@COLING/ACL.

[14]  R. Pieraccini,et al.  Interactive grammar inference with finite state transducers , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[15]  Alexandros Potamianos,et al.  Using lexical, syntactic and semantic features for non-terminal grammar rule induction in Spoken Dialogue Systems , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[16]  John P. McCrae,et al.  Design Patterns for Engineering the Ontology-Lexicon Interface , 2014, Towards the Multilingual Semantic Web.

[17]  Nurfadhlina Mohd Sharef,et al.  Minimal Combination for Incremental Grammar Fragment Learning , 2009, IFSA/EUSFLAT Conf..

[18]  Elias Iosif,et al.  Network-based distributional semantic models , 2013 .

[19]  Alex Acero,et al.  Rapid development of spoken language understanding grammars , 2006, Speech Commun..

[20]  Xuedong Huang,et al.  A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[21]  Joana Paulo Pardal,et al.  Dynamic Use of Ontologies in Dialogue Systems , 2007, NAACL.

[22]  Joan-Andreu Sánchez,et al.  Combination Of N-Grams And Stochastic Context-Free Grammars For Language Modeling , 2000, COLING.

[23]  Georgios Paliouras,et al.  eg-GRIDS: Context-Free Grammatical Inference from Positive Examples Using Genetic Search , 2004, ICGI.