Learning Language from a Large (Unannotated) Corpus

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.

[1]  David Temperley,et al.  Minimization of dependency length in written English , 2007, Cognition.

[2]  L. Goddard Information Theory , 1962, Nature.

[3]  A. J. Bell THE CO-INFORMATION LATTICE , 2003 .

[4]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[5]  Ben Goertzel,et al.  OpenCog: A Software Framework for Integrative Artificial General Intelligence , 2008, AGI.

[6]  Wilfrid Hodges,et al.  A Shorter Model Theory , 1997 .

[7]  Ben Goertzel,et al.  A pragmatic path toward endowing virtually-embodied AIs with human-level linguistic capability , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[8]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[9]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[10]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[11]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[12]  A Mel'čukIgor,et al.  A formal lexicon in the Meaning-Text Theory , 1987 .

[13]  Tobias Nipkow,et al.  Term rewriting and all that , 1998 .

[14]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[15]  R. Ferrer i Cancho Why do syntactic links not cross , 2006 .

[16]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[17]  Haitao Liu,et al.  Dependency Distance as a Metric of Language Comprehension Difficulty , 2008 .

[18]  Dimitri Kartsaklis,et al.  Reasoning about Meaning in Natural Language with Compact Closed Categories and Frobenius Algebras , 2014, ArXiv.

[19]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[20]  Richard Hudson,et al.  Language Networks: The New Word Grammar , 2007 .

[21]  Noah A. Smith,et al.  Covariance in Unsupervised Learning of Probabilistic Grammars , 2010, J. Mach. Learn. Res..

[22]  Jasmina Milićević,et al.  A Short Guide to the Meaning-Text Linguistic Theory , 2006 .

[23]  Ben Goertzel,et al.  A General Intelligence Oriented Architecture for Embodied Natural Language Processing , 2010, AGI 2010.

[24]  Rada Mihalcea,et al.  Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.

[25]  Simon C. K. Shiu,et al.  Syntax-Semantic Mapping for General Intelligence: Language Comprehension as Hypergraph Homomorphism, Language Generation as Constraint Satisfaction , 2012, AGI.

[26]  James Steele,et al.  Meaning-text theory : linguistics, lexicography, and implications , 1990 .

[27]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[28]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[29]  Ben Goertzel,et al.  Using Dependency Parsing and Probabilistic Inference to Extract Relationships between Genes, Proteins and Malignancies Implicit Among Multiple Biomedical Research Abstracts , 2006, BioNLP@NAACL-HLT.

[30]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[31]  Ben Goertzel,et al.  Probabilistic Logic Networks , 2009 .

[32]  Rada Mihalcea,et al.  Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity , 2007, International Conference on Semantic Computing (ICSC 2007).