Bootstrapping structure into language : alignment-based learning

refined and abstract meanings largely grow out of more concrete meanings. Bloomfield (1933) This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability . Instances of the framework can be applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of that corpus. Firstly, the framework aligns all sentences in the corpus in pairs, resulting in a partition of the sentences consisting of parts of the sentences that are equal in both sentences and parts that are unequal. Unequal parts of sen tences can be seen as being substitutable for each other, since substituting one unequal part for the other results in another valid sentence. The unequal parts of the sentences are thus considered to be possible (possibly overlapping) constituents, called hypotheses. Secondly , the selection learning phase considers all hypotheses found by the alignment learning phase and selects the best of these. The hypotheses are selected based on the order in which they were found, or based on a probabilistic function. The framework can be extended with a grammar extraction phase. This extended framework is called parseABL. Instead of returning a structured version of the unstructured input corpus, like the ABL system, this system also returns a stochastic context-free or tree substitution grammar. Different instances of the framework have been tested on the English ATIS corpus, the Dutch OVIS corpus and the Wall Street Journal corpus. One of the interesting results, apart from the encouraging numerical results, is that all instances can (and do) learn recursive structures.

[1]  Gottlob Frege,et al.  Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens , 1879 .

[2]  L. Carroll,et al.  Alice's Adventures in Wonderland: Princeton University Press , 2015 .

[3]  George A. Miller,et al.  Language and Communication , 1951 .

[4]  A. Ross Structural Linguistics , 1953, Nature.

[5]  Oscar H. IBARm Information and Control , 1957, Nature.

[6]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[7]  Michael Frayn,et al.  The Tin Men , 1965 .

[8]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[10]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[11]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[12]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[13]  Taylor L. Booth,et al.  Probabilistic Representation of Formal Languages , 1969, SWAT.

[14]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[15]  Richard Montague,et al.  The Proper Treatment of Quantification in Ordinary English , 1973 .

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[17]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[18]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  Noam Chomsky,et al.  The Logical Structure of Linguistic Theory , 1975 .

[20]  J. Wolff AN ALGORITHM FOR THE SEGMENTATION OF AN ARTIFICIAL LANGUAGE ANALOGUE , 1975 .

[21]  Azriel Rosenfeld,et al.  Grammatical inference by hill climbing , 1976, Inf. Sci..

[22]  Qin Lu,et al.  Preface , 1976, Brain Research Bulletin.

[23]  J. Wolff The discovery of segments in natural language , 1977 .

[24]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[25]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[26]  J. Baker Trainable grammars for speech recognition , 1979 .

[27]  J. Wolff,et al.  Language Acquisition and the Discovery of Phrase Structure , 1980, Language and speech.

[28]  J. Gerard Wolfp,et al.  Language Acquisition and the Discovery of Phrase Structure , 1980 .

[29]  Morris Krieger,et al.  The C Primer , 1982 .

[30]  L. Carroll,et al.  The Complete Illustrated Works of Lewis Carroll , 1982 .

[31]  J. Gerard Wolff,et al.  Language acquisition, data compression and generalization , 1982 .

[32]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[33]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[34]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[35]  Donald E. Knuth,et al.  The TeXbook , 1984 .

[36]  Steven Pinker,et al.  Language learnability and language development , 1985 .

[37]  Leslie Lamport,et al.  Latex : A Document Preparation System , 1985 .

[38]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[39]  Donald E. Knuth,et al.  The T E Xbook , 1987 .

[40]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[41]  J. Wolff Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[42]  Judy Kay,et al.  C programming in a UNIX environment , 1989 .

[43]  Mitchell P. Marcus,et al.  Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[44]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[45]  Victor Sadler,et al.  Pilot Implementation of a Bilingual Knowledge Bank , 1990, COLING.

[46]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[47]  Ming Li,et al.  Learning Simple Concept Under Simple Distributions , 1991, SIAM J. Comput..

[48]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[49]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[50]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[51]  Michel Goossens,et al.  The LaTeX companion , 1993 .

[52]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[53]  David B. Searls,et al.  The computational linguistics of biological sequences , 1993, ISMB 1995.

[54]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[55]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[56]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[57]  S. Pinker The language instinct : the new science of language and mind , 1994 .

[58]  Miles Osborne,et al.  Learning unification-based natural language grammars , 1995, ArXiv.

[59]  Andreas Stolcke,et al.  Bayesian learning of probabilistic language models , 1994 .

[60]  Christopher C. Huckle Grouping Words Using Statistical Context , 1995, EACL.

[61]  Timo Honkela,et al.  Contextual Relations of Words in Grimm Tales, Analyzed by Self-Organizing Map , 1995 .

[62]  Peter Grünwald,et al.  A minimum description length approach to grammar inference , 1995, Learning for Natural Language Processing.

[63]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[64]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[65]  Carl de Marcken,et al.  The Unsupervised Acquisition of a Lexicon from Continuous Speech , 1995, ArXiv.

[66]  Carl de Marcken Acquiring a Lexicon from Unsegmented Speech , 1995, ACL.

[67]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[68]  Carl de Marcken,et al.  Unsupervised language acquisition , 1996, ArXiv.

[69]  Robert M. Losee,et al.  Learning Syntactic Rules and Tags with Genetic Algorithms for Information Retrieval and Filtering: An Empirical Basis for Grammatical Rules , 1995, Inf. Process. Manag..

[70]  J. Gerard Wolff,et al.  LEARNING AND REASONING AS INFORMATION COMPRESSION BY MULTIPLE ALIGNMENT, UNIFICATION AND SEARCH , 1996 .

[71]  T.J.P. Hubbard,et al.  Gathering them in to the fold , 1996, Nature Structural Biology.

[72]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[73]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[74]  Menno van Zaanen,et al.  Error Correction using DOP , 1997 .

[75]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[76]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[77]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[78]  G. Sampson Educating Eve: The 'Language Instinct' Debate , 1997 .

[79]  Rens Bod,et al.  A DOP Model for Semantic Interpretation , 1997, ACL.

[80]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[81]  New Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, CL.

[82]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..

[83]  Khalil Sima'an,et al.  Learning Efficient Disambiguation , 1999, ArXiv.

[84]  Walter Daelemans Special Issue on Memory-based Language Processing , 1999 .

[85]  Rebecca Hwa Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[86]  Menno van Zaanen Bootstrapping structure using similarity , 1999, CLIN.

[87]  Andy Way A hybrid architecture for robust MT using LFG-DOP , 1999, J. Exp. Theor. Artif. Intell..

[88]  Kenji Yamada,et al.  A Computational Approach to Deciphering Unknown Scripts , 1999 .

[89]  Eric Gaussier,et al.  Unsupervised learning of derivational morphology from inflectional lexicons , 1999 .

[90]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[91]  Hervé Déjean ALLiS: a Symbolic Learning System for Natural Language Learning , 2000, CoNLL/LLL.

[92]  Pieter W. Adriaans,et al.  Towards High Speed Grammar Induction on Large Text Corpora , 2000, SOFSEM.

[93]  Arjen Poutsma Data-Oriented Translation , 2000, COLING.

[94]  Katsuhiko Nakamura,et al.  Synthesizing Context Free Grammars from Sample Strings Based on Inductive CYK Algorithm , 2000, ICGI.

[95]  Menno van Zaanen,et al.  ABL: Alignment-Based Learning , 2000, COLING.

[96]  Menno M. van Zaanen Learning Structure using Alignment Based Learning , 2000 .

[97]  Geoffrey Sampson,et al.  A proposal for improving the measurement of parse accuracy , 2000 .

[98]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[99]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[100]  Rens Bod,et al.  Parsing with the Shortest Derivation , 2000, COLING.

[101]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[102]  Joan-Andreu Sánchez,et al.  Combination of Estimation Algorithms and Grammatical Inference Techniques to Learn Stochastic Context-Free Grammars , 2000, ICGI.

[103]  Arjen Poutsma Data-Oriented Translation: Using the DataOriented Parsing framework for Machine Translation , 2000 .

[104]  Yasubumi Sakakibara,et al.  Learning Context-Free Grammars from Partially Structured Examples , 2000, ICGI.

[105]  Menno van Zaanen Bootstrapping Syntax and Recursion using Alginment-Based Learning , 2000, ICML.

[106]  Bibliography , 2000, More Than They Promised.

[107]  Pieter W. Adriaans,et al.  Learning Shallow Context-free Languages under Simple Distributions , 2001 .

[108]  Menno van Zaanen,et al.  Alignment-based learning versus emile: A comparison , 2001 .

[109]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[110]  Dan Klein,et al.  Distributional phrase structure induction , 2001, CoNLL.

[111]  Richard Durbin Interpreting the Human Genome Sequence, Using Stochastic Grammars , 2001, ACL.

[112]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[113]  Alexander Clark,et al.  Learning Morphology with Pair Hidden Markov Models , 2001, ACL.

[114]  Menno van Zaanen,et al.  Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE , 2001 .

[115]  Andrew G. Clark,et al.  Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) , 2002 .

[116]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[117]  C. Habel,et al.  Language , 1931, NeuroImage.

[118]  Khalil Sima'an,et al.  Alignment-Based Learning versus Data-Oriented Parsing , 2003 .

[119]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .