Supercombinator set construction from a context-free representation of text

Grammars might be used for various other aspects, than just to represent a language. Grammar inference is a large field which main goal is the construction of grammars from various sources. Written text might be analysed indirectly with the use of such inferred grammars. Grammars acquired from processed text might grow into large structures as the inference process could be continuous. We present a method to decompose and store grammars into a non-redundant set of lambda calculus supercombinators. Grammars decomposition is based on their structure and each distinct element is stored only once in such a structure. We present a method that can create such a set from any context-free grammar. To prove this and to show the possible applications in the field of natural language processing we present a case study performed on samples from two books. Those samples are the entire Book of Genesis from The King James Bible and the first 24 chapters of War and peace by Tolstoy. We obtain context-free grammars with the Sequitur algorithm and then we process them with our method. The results show significant decline in the number of grammar elements in all cases.

[1]  Petr Saloun From lightweight ontology to mental illness indication , 2015, 2015 IEEE 13th International Scientific Conference on Informatics.

[2]  Krzysztof Jassem,et al.  Automatic summarization of Polish news articles by sentence selection , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[3]  S. Edelman,et al.  Learn locally, act globally: Learning language from variation set cues , 2008, Cognition.

[4]  Michal Sicak,et al.  Higher order regular expressions , 2015, 2015 13th International Conference on Engineering of Modern Electric Systems (EMES).

[5]  James R. Cordy,et al.  Grammatical Inference in Software Engineering: An Overview of the State of the Art , 2012, SLE.

[6]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[7]  Colin de la Higuera,et al.  A bibliographical study of grammatical inference , 2005, Pattern Recognit..

[8]  Menno van Zaanen,et al.  Computational Grammar Induction for Linguists , 2004, Grammars.

[9]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[10]  Ján Kollár,et al.  Towards machine mind evolution , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[11]  Ján,et al.  ABSTRACT LANGUAGE OF THE MACHINE MIND , 2016 .

[12]  Piotr Szwed,et al.  Concepts extraction from unstructured Polish texts: A rule based approach , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[13]  Shimon Edelman,et al.  On the nature of minds, or: truth and consequences , 2008, J. Exp. Theor. Artif. Intell..

[14]  Ralf Lämmel,et al.  Towards an engineering discipline for GRAMMARWARE Draft as of August 17 , 2003 , 2003 .

[15]  Maria João Varanda Pereira,et al.  Probabilistic SynSet Based Concept Location , 2012, SLATE.

[16]  Luis Iribarne,et al.  Information retrieval using an Ontological Web-Trading model , 2013, 2013 Federated Conference on Computer Science and Information Systems.

[17]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..