Information and Incrementality in Syntactic Bootstrapping

Title of dissertation: INFORMATION AND INCREMENTALITY IN SYNTACTIC BOOTSTRAPPING Aaron Steven White, Doctor of Philosophy, 2015 Dissertation directed by: Professor Valentine Hacquard Department of Linguistics Some words are harder to learn than others. For instance, action verbs like run and hit are learned earlier than propositional attitude verbs like think and want. One reason think and want might be learned later is that, whereas we can see and hear running and hitting, we can’t see or hear thinking and wanting. Children nevertheless learn these verbs, so a route other than the senses must exist. There is mounting evidence that this route involves, in large part, inferences based on the distribution of syntactic contexts a propositional attitude verb occurs in—a process known as syntactic bootstrapping. This fact makes the domain of propositional attitude verbs a prime proving ground for models of syntactic bootstrapping. With this in mind, this dissertation has two goals: on the one hand, it aims to construct a computational model of syntactic bootstrapping; on the other, it aims to use this model to investigate the limits on the amount of information about propositional attitude verb meanings that can be gleaned from syntactic distributions. I show throughout the dissertation that these goals are mutually supportive. In Chapter 1, I set out the main problems that drive the investigation. In Chapters 2 and 3, I use both psycholinguistic experiments and computational modeling to establish that there is a significant amount of semantic information carried in both participants’ syntactic acceptability judgments and syntactic distributions in corpora. To investigate the nature of this relationship I develop two computational models: (i) a nonnegative model of (semantic-to-syntactic) projection and (ii) a nonnegative model of syntactic bootstrapping. In Chapter 4, I use a novel variant of the Human Simulation Paradigm to show that the information carried in syntactic distribution is actually utilized by (simulated) learners. In Chapter 5, I present a proposal for how to solve a standing problem in how syntactic bootstrapping accounts for certain kinds of cross-linguistic variation. And in Chapter 6, I conclude with some future directions for this work. INFORMATION AND INCREMENTALITY IN SYNTACTIC BOOTSTRAPPING

[1]  Wataru Uegaki,et al.  Content Nouns and the Semantics of Question-Embedding , 2015, J. Semant..

[2]  Franziska Frankfurter,et al.  Constructions: A construction grammar approach to argument structure: Adele E. Goldberg, Chicago, IL: The University of Chicago Press, 1995. xi + 265 pp , 1998 .

[3]  Tom M. Mitchell,et al.  A Compositional and Interpretable Semantic Space , 2015, NAACL.

[4]  Benjamin Spector,et al.  A uniform semantics for embedded interrogatives: an answer, not necessarily the answer , 2015, Synthese.

[5]  Alexander Williams,et al.  Arguments in Syntax and Semantics , 2015 .

[6]  Sean Zdenek 8. In a Manner of Speaking , 2015 .

[7]  Kaitlyn P. Harrigan Syntactic Bootstrapping in the Acquisition of Attitude Verbs , 2015 .

[8]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[9]  S. Wurmbrand Tense and Aspect in English Infinitives , 2014, Linguistic Inquiry.

[10]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[11]  Libby Barak,et al.  Learning Verb Classes in an Incremental Model , 2014, CMCL@ACL.

[12]  Tom M. Mitchell,et al.  Interpretable Semantic Vectors from a Joint Model of Brain- and Text- Based Meaning , 2014, ACL.

[13]  Pranav Anand,et al.  Factivity, Belief and Discourse ⇤ , 2014 .

[14]  Libby Barak,et al.  Gradual Acquisition of Mental State Meaning: A Computational Investigation , 2014, CogSci.

[15]  V. Hacquard,et al.  Epistemics and attitudes , 2013 .

[16]  Martha Palmer,et al.  The VerbCorner Project: Toward an Empirically-Based Semantic Decomposition of Verbs , 2013, EMNLP.

[17]  Libby Barak,et al.  Acquisition of Desires before Beliefs: A Computional Investigation , 2013, CoNLL.

[18]  Mark Steedman,et al.  Combined Distributional and Logical Semantics , 2013, TACL.

[19]  L. Gleitman,et al.  Propose but verify: Fast mapping meets cross-situational word learning , 2013, Cognitive Psychology.

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21]  Sumio Watanabe,et al.  A widely applicable Bayesian information criterion , 2012, J. Mach. Learn. Res..

[22]  Dan Roth,et al.  Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision , 2013, Cognitive Aspects of Computational Language Acquisition.

[23]  Mandy Simons,et al.  On the Conversational Basis of Some Presuppositions , 2013 .

[24]  Tom M. Mitchell,et al.  Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding , 2012, COLING.

[25]  Paul Portner,et al.  Mood and Contextual Commitment , 2012 .

[26]  Alexander Williams,et al.  Null Complement Anaphors as Definite Descriptions , 2012 .

[27]  V. Hacquard,et al.  Embedding epistemic modals in English: A corpus-based study , 2012 .

[28]  Chen Yu,et al.  Modeling cross-situational word-referent learning: prior questions. , 2012, Psychological review.

[29]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[30]  Jacopo Romoli,et al.  The Presuppositions of Soft Triggers aren't Presuppositions , 2011 .

[31]  B. J. Meira Adverbs and functional heads: a cross-linguistic perspective , 2011 .

[32]  L. Gleitman,et al.  How words can and cannot be learned by observation , 2011, Proceedings of the National Academy of Sciences.

[33]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[34]  Peggy Speas,et al.  Configurational Properties of Point of View Roles , 2011 .

[35]  David Huard,et al.  PyMC: Bayesian Stochastic Modelling in Python. , 2010, Journal of statistical software.

[36]  Diarmuid Ó Séaghdha Latent Variable Models of Selectional Preference , 2010, ACL.

[37]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[38]  V. Hacquard On the event relativity of modal auxiliaries , 2010 .

[39]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[40]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[41]  Michael C. Frank,et al.  PSYCHOLOGICAL SCIENCE Research Article Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning , 2022 .

[42]  Zoubin Ghahramani,et al.  Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering , 2009 .

[43]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[44]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[45]  Keir Moulton,et al.  Natural Selection and the Syntax of Clausal Complementation , 2009 .

[46]  Tatjana Scheffler,et al.  Evidentiality and German Attitude Verbs , 2009 .

[47]  Veneeta Dayal,et al.  1 Subordination at the Interface : the Quasi-Subordination Hypothesis * , 2009 .

[48]  Thomas L. Griffiths,et al.  Latent Features in Similarity Judgments: A Nonparametric Bayesian Approach , 2008, Neural Computation.

[49]  Zoubin Ghahramani,et al.  Dirichlet Process Mixture Models for Verb Clustering , 2008 .

[50]  Paul Egré,et al.  QUESTION-EMBEDDING AND FACTIVITY , 2008 .

[51]  E. Villalta Mood and gradability: an investigation of the subjunctive mood in Spanish , 2008 .

[52]  Suzanne Stevenson,et al.  A Computational Model of Early Argument Structure Acquisition , 2008, Cogn. Sci..

[53]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[54]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[55]  Jon Gajewski,et al.  Neg-Raising and Polarity , 2007 .

[56]  L. Gleitman,et al.  When we think about thinking: The acquisition of belief verbs , 2007, Cognition.

[57]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[58]  Mandy Simons,et al.  Observations on embedding verbs, evidentiality, and presupposition , 2007 .

[59]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[60]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[61]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[62]  K. Maier INQUIRY , 2007 .

[63]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[64]  Brad Abbott,et al.  Where have some of the presuppositions gone , 2006 .

[65]  E. Kako Thematic role properties of subjects and objects , 2006, Cognition.

[66]  Sabine Schulte im Walde Experiments on the Automatic Induction of German Semantic Verb Classes , 2006, CL.

[67]  Hubert Truckenbrodt,et al.  On the semantic motivation of syntactic verb movement to C in German , 2006 .

[68]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[69]  Karin Kipper Schuler,et al.  Argument Realization , 2006, Comput. Linguistics.

[70]  Ted Briscoe,et al.  A Large Subcategorization Lexicon for Natural Language Processing Applications , 2006, LREC.

[71]  Kjell Johan Sæbø A Whether Forecast , 2005, TbiLLC.

[72]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[73]  Jing Zhao,et al.  Document Clustering Based on Nonnegative Sparse Matrix Factorization , 2005, ICNC.

[74]  de Villiers,et al.  Can Language Acquisition Give Children a Point of View , 2005 .

[75]  J. Trueswell,et al.  The developing constraints on parsing decisions: The role of lexical-biases and referential scenes in child and adult sentence processing , 2004, Cognitive Psychology.

[76]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[77]  M. Speas Evidentiality, logophoricity and the syntactic representation of pragmatic features , 2004 .

[78]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[79]  Lila R. Gleitman,et al.  Why It Is Hard to Label Our Concepts. , 2004 .

[80]  Suzanne Stevenson,et al.  Semi-supervised Verb Class Discovery Using Noisy Features , 2003, CoNLL.

[81]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[82]  L. Gleitman 11. Verbs of a feather flock together II , 2002 .

[83]  S. J. Keyser,et al.  Prolegomenon to a Theory of Argument Structure , 2002 .

[84]  Dorit Abusch,et al.  Lexical Alternatives as a Source of Pragmatic Presuppositions , 2002 .

[85]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[86]  Chris Brew,et al.  Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information , 2002, ACL.

[87]  Utpal Lahiri,et al.  Questions and Answers in Embedded Contexts , 2002 .

[88]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[89]  A. Korhonen Subcategorization acquisition , 2002 .

[90]  M. Tomasello,et al.  The acquisition of finite complement clauses in English: A corpus-based analysis , 2001 .

[91]  Suzanne Stevenson,et al.  Automatic Verb Classification Based on Statistical Distributions of Argument Structure , 2001, CL.

[92]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[93]  Jeffrey Lidz,et al.  Kidz in the 'Hood: Syntactic Bootstrapping and the Mental Lexicon , 2001 .

[94]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[95]  I. Sag,et al.  Interrogative Investigations , 2001 .

[96]  Elisabeth Villalta,et al.  Spanish Subjunctive Clauses Require Ordered Alternatives , 2000 .

[97]  Sabine Schulte im Walde Clustering Verbs Semantically According to their Alternation Behaviour , 2000, COLING.

[98]  B. MacWhinney The Childes Project: Tools for Analyzing Talk, Volume I: Transcription format and Programs , 2000 .

[99]  B. MacWhinney The Childes Project: Tools for Analyzing Talk, Volume II: the Database , 2000 .

[100]  Nicholas Asher,et al.  Truth Conditional Discourse Semantics for Parentheticals , 2000, J. Semant..

[101]  Marcela A. Depiante,et al.  The syntax of deep and surface anaphora: A study of null complement anaphora and stripping/bare argument ellipsis , 2000 .

[102]  J. Snedeker,et al.  Cross-situational observation and the semantic bootstrapping hypothesis , 2000 .

[103]  H. Gleitman,et al.  Human simulations of vocabulary learning , 1999, Cognition.

[104]  Manuela Ambar Aspects of the Syntax of Focus in Portuguese , 1999 .

[105]  Manfred Krifka,et al.  Quantifying into Question Acts , 1999 .

[106]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[107]  Suzanne Stevenson,et al.  Automatic Verb Classification Using Distributions of Grammatical Features , 1999, EACL.

[108]  Lila R. Gleitman,et al.  The successes and failures of word-to-world mapping , 1999 .

[109]  J. Quer Mood at the interface , 2000 .

[110]  Ernie Lepore,et al.  The Emptiness of the Lexicon: Reflections on James Pustejovsky's The Generative Lexicon , 1998, Linguistic Inquiry.

[111]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[112]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[113]  A. Giannakidou The Landscape of Polarity Items , 1997 .

[114]  L. Rizzi The Fine Structure of the Left Periphery , 1997 .

[115]  Edward Kako,et al.  Subcategorization Semantics and the Naturalness of Verb-Frame Pairings , 1997 .

[116]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[117]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[118]  Joshua B. Tenenbaum,et al.  Learning the Structure of Similarity , 1995, NIPS.

[119]  Kenneth Ward Church,et al.  Poisson mixtures , 1995, Natural Language Engineering.

[120]  Mats Rooth,et al.  Two-dimensional clusters in grammatical relations , 1995 .

[121]  Jonathan Ginzburg,et al.  Resolving questions, II , 1995 .

[122]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[123]  Susan M. Garnsey,et al.  Semantic Influences On Parsing: Use of Thematic Role Information in Syntactic Ambiguity Resolution , 1994 .

[124]  S. Pinker How could a child use verb syntax to learn verb semantics , 1994 .

[125]  A. Newell On Declarative Sentences , 1994 .

[126]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[127]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[128]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[129]  Christopher T. Kello,et al.  Verb-specific constraints in sentence processing: separating effects of lexical preference from garden-paths. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[130]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[131]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .

[132]  Jorge Hankamer,et al.  Deep and surface anaphora , 1993 .

[133]  P. Portner Situation theory and the semantics of propositional expressions , 1992 .

[134]  Paul M. Postal,et al.  Some defective paradigms , 1992 .

[135]  Irene Heim,et al.  Presupposition Projection and the Semantics of Attitude Verbs , 1992, J. Semant..

[136]  David R. Dowty Thematic proto-roles and argument selection , 1991 .

[137]  Cynthia Fisher,et al.  On the semantic content of subcategorization frames , 1991, Cognitive Psychology.

[138]  Letitia R. Naigles,et al.  Learnability and Cognition: The Acquisition of Argument Structure , 1991 .

[139]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[140]  Ellen M. Markman,et al.  Constraints Children Place on Word Meanings , 1990, Cogn. Sci..

[141]  W. Merriman,et al.  The mutual exclusivity bias in children's word learning. , 1989, Monographs of the Society for Research in Child Development.

[142]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[143]  Howard Lasnik On Certain Substitutes for Negative Data , 1989 .

[144]  L. Gleitman,et al.  Language and Experience: Evidence from the Blind Child , 1988 .

[145]  E. Markman,et al.  Children's use of mutual exclusivity to constrain the meanings of words , 1988, Cognitive Psychology.

[146]  Mark C. Baker,et al.  Incorporation: A Theory of Grammatical Function Changing , 1988 .

[147]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[148]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[149]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[150]  D. Farkas Intensional descriptions and the romance subjunctive mood , 1985 .

[151]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[152]  Robert C. Berwick,et al.  The acquisition of syntactic knowledge , 1985 .

[153]  John Baker Wittgenstein on Rules and Private Language: An Elementary Exposition , 1984 .

[154]  E. Markman,et al.  Children's sensitivity to constraints on word meaning: Taxonomic versus thematic relations , 1984, Cognitive Psychology.

[155]  P. Johnson-Laird,et al.  Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness , 1985 .

[156]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[157]  Hans den Besten,et al.  On the Interaction of Root Transformations and Lexical Deletive Rules , 1983 .

[158]  Jean Berko Gleason,et al.  Parent–child interaction and the acquisition of lexical information during play. , 1980 .

[159]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[160]  Susan Carey,et al.  Acquiring a Single New Word , 1978 .

[161]  Emmon W. Bach,et al.  On Raising: One Rule of English Grammar and Its Theoretical Implications , 1978 .

[162]  A. Tversky Features of Similarity , 1977 .

[163]  Ellen F. Prince,et al.  The Syntax and Semantics of Neg-Raising, with Evidence from French , 1976 .

[164]  J. Hooper On Assertive Predicates , 1975 .

[165]  Jaakko Hintikka,et al.  Different Constructions in Terms of the Basic Epistemological Verbs , 1975 .

[166]  Robert Stalnaker Presuppositions , 1998, J. Philos. Log..

[167]  Henry Hamburger,et al.  On the Insufficiency of Surface Data for the Learning of Transformational Languages , 1973 .

[168]  Ray Jackendoff,et al.  Semantic Interpretation in Generative Grammar , 1972 .

[169]  Laurence R. Horn,et al.  On the semantic properties of logical operators in english' reproduced by the indiana university lin , 1972 .

[170]  Lauri Karttunen,et al.  Some observations on factivity , 1971 .

[171]  D. Bolinger,et al.  Postposed main phrases: an English rule for the Romance subjunctive , 1968, Canadian Journal of Linguistics/Revue canadienne de linguistique.

[172]  Jeffrey Gruber Studies in lexical relations , 1965 .

[173]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[174]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[175]  J. Fodor,et al.  The structure of a semantic theory , 1963 .

[176]  C. Fillmore,et al.  The position of embedding transformations in a grammar . Some syntactic rules in Mandarin . Tree representations in linguistics , 1963 .

[177]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[178]  O. Van Word and Object , 1960 .

[179]  Roger S. Brown,et al.  Linguistic determinism and the part of speech. , 1957, Journal of abnormal psychology.

[180]  L. Guttman Some necessary conditions for common-factor analysis , 1954 .

[181]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.