Programmatic Link Grammar Induction for Unsupervised Language Learning

Although natural (i.e. human) languages do not seem to follow a strictly formal grammar, their structure analysis and generation can be approximated by one. Having such a grammar is an important tool for programmatic language understanding. Due to the huge number of natural languages and their variations, processing tools that rely on human intervention are available only for the most popular ones. We explore the problem of unsupervisedly inducing a formal grammar for any language, using the Link Grammar paradigm, from unannotated parses also obtained without supervision from an input corpus. The details of our state-of-the-art grammar induction technology and its evaluation techniques are described, as well as preliminary results of its application on both synthetic and real world text-corpora.

[1]  Emmanuel Dupoux,et al.  Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.

[2]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[3]  Filip Karlo Dosilovic,et al.  Explainable artificial intelligence: A survey , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[4]  Ben Goertzel,et al.  Engineering General Intelligence, Part 2 , 2014, Atlantis Thinking Machines.

[5]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[6]  Anton Kolonin,et al.  Unsupervised Language Learning in OpenCog , 2018, AGI.

[7]  Claudia Castillo-Domenech,et al.  Statistical parsing and unambiguous word representation in OpenCog’s Unsupervised Language Learning project , 2018 .

[8]  M. Brent,et al.  The role of exposure to isolated words in early vocabulary development , 2001, Cognition.

[9]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[10]  James R. Glass,et al.  Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.

[11]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[12]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[13]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[14]  G. Wahba,et al.  Multivariate Bernoulli distribution , 2012, 1206.1874.

[15]  Ben Goertzel,et al.  Learning Language from a Large (Unannotated) Corpus , 2014, ArXiv.

[16]  Ben Goertzel,et al.  Engineering General Intelligence, Part 2: The CogPrime Architecture for Integrative, Embodied AGI , 2014 .

[17]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Peter D. Stetson,et al.  An Unsupervised Machine Learning Approach to Segmentation of Clinician-Entered Free Text , 2007, AMIA.

[19]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.