A Computational Model of Language Learning Driven by Training Inputs

Language learning involves linguistic environments around the learner. So the variation in training input to which the learner is exposed has been linked to their language learning. We explore how linguistic experiences can cause differences in learning linguistic structural features, as investigate in a probabilistic graphical model. We manipulate the amounts of training input, composed of natural linguistic data from animation videos for children, from holistic (one-word expression) to compositional (twoto six-word one) gradually. The recognition and generation of sentences are a “probabilistic” constraint satisfaction process which is based on massively parallel DNA chemistry. Random sentence generation tasks succeed when networks begin with limited sentential lengths and vocabulary sizes and gradually expand with larger ones, like children‟s cognitive development in learning. This model supports the suggestion that variations in early linguistic environments with developmental steps may be useful for facilitating language acquisition. Introduction One of the critical aspects of language learning is that it develops. Different from traditional computational approach which considers language learning as an innate rule learning and template matching, developmental accounts argue that infants‟ language learning depends on their environment, in particular the linguistic environment (Kaplan, Oudeyer, and Bergen 2008). We bring in developmental model methodology emphasizing computational learning in an incremental and open-ended way. Specifically, we explore the relationship between language capacity and its language environments (training inputs). Elman (1993) showed that a gradual increase of attention span or, equivalently, a gradual increase of memory size allowed his neural networks to solve tasks that were unsolvable when starting with a `full-grown' network. Following Elman, we let the agents themselves go through developmental stages, and in addition to that, we manipulate the world (the amount of training input). We consider a computer agent which takes a stream of various commercial video scripts for children step by step and progresses in language learning. Specifically, we investigate the use of the DNA hypernetwork model for learning to generate sentences based on a text collection of natural dialogues. Hypernetworks are originally proposed as an associative memory model inspired by and realized in molecular self-assembly (Zhang and Jang, 2006). A hypernetwork consists of a huge number of hyperedges, each of which links vertices of arbitrary size and thus is able to encode higher-order interactions or constraints among the variables. This view of hyperedges as constraints extends the application range of hypernetworks far beyond the associative memory (Chen et al., 2005) to associative processors. Using hypernetwork structure with growing data from video corpus, it develops a concept for a given keyword by the associative memory organizing mechanism. Based on its plausibly crafted concept, we test its ability to generate sentences on the given keyword. We simulate the DNA hypernetworks to learn a language model and to generate new sentences based on a text corpus of approximately 30,000 sentences collected from animation videos for children with amount control. And we show the result of experiments in which concepts developed and generating sentences for a given keyword. By doing this, we check our expectation that the hypernetwork be able to show some language learning capacity and its language environments be clearly and closely related with each other. Learning Based on the Data The experiments in this paper are based on commercial video scripts for children. For the training text corpus input, we used commercial and educational animation video scripts for children.  This corpus consists of S = 32,744 sentences excerpted from animation video scripts used in educational curricula from three to seven years old child. The script data are divided into 11 different learning stages classified by reading difficulty level. The corpus has 6,124 word types and 252,936 word tokens. Consider a language learner which takes a stream of linguistic data. It interacts with linguistic data online, develops its initial concepts on some specific linguistic items, and has an internal representative semantic * The titles of the video materials are as follows: Miffy, Looney the Tune, Caillou, Dora Dora, McDonald, Timothy, Kitty, Snoopy. The learning order fixation of materials is based on the recommended consumer ages of the each video product. structure for given stimulus. Figure 1 shows a high-level sketch of the complete model. The intuition behind this architecture is as follows. The language learner takes a stream of various commercial video scripts for children step by step and progresses in language learning. A linguistic hypernetwork, the language learner, represents a probabilistic model of the data set using a population of hyperedges and their weights. See Figure 2 for an example of such a concept map, and how it deals the set of concepts associated which are insensible, but which gradually governs semantic coherence of its language. The task is to learn a language model P S = P(x) = P x1 , ... , xn from a collection of example sentences D = {x} . Given a list of query words or a query sentence x, the model is to generate a (potentially) new sentence x. To solve the language generation problem, we estimate the joint probability of words, P x = P(x1 , x2, ... , xn ) as a language model. Given a query sentence with the iFigure 1. Process of generating a new sentence from a keyword (in this case, the keyword is "beautiful"). The given keyword is extended by assembling a new word to the left and right ends of the existing partial sentence. th position blank, x−i (q) =(x1 (q) , x2 (q) , ... , xi−1 (q) , xi+1 (q) , ... , xn (q) ) as context or history h, the model is used to choose the word x as xi∗ = arg max xi P xi x−i q = arg max xi P xi h where xi* is the word that maximizes the conditional probability. Conventional statistical language models estimate the probability of a sentence S by using the chain rule to decompose it into a product of conditional probabilities: P S = P x = P x1 , ... , xn = P xi x1, ... , xi−1) n

[1]  Luc Steels,et al.  The emergence and evolution of linguistic structure: from lexical to grammatical communication systems , 2005, Connect. Sci..

[2]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[3]  Byoung-Tak Zhang,et al.  Molecular Learning of wDNF Formulae , 2005, DNA.

[4]  Noam Chomsky Reflections on Language. , 1977 .

[5]  Simon Kirby,et al.  Complex Systems in Language Evolution: the Cultural Emergence of Compositional Structure , 2003, Adv. Complex Syst..

[6]  S. Suter Meaningful differences in the everyday experience of young American children , 2005, European Journal of Pediatrics.

[7]  Pierre-Yves Oudeyer,et al.  Computational Models in the Debate over Language Learnability , 2008 .

[8]  J. Elman,et al.  Language input and semantic categories: a relation between cognition and early word learning , 2006, Journal of Child Language.

[9]  Paul Vogt,et al.  On the Acquisition and Evolution of Compositional Languages: Sparse Input and the Productive Creativity of Children , 2005, Adapt. Behav..

[10]  Byoung-Tak Zhang,et al.  Hypernetworks: A Molecular Evolutionary Architecture for Cognitive Learning and Memory , 2008, IEEE Computational Intelligence Magazine.

[11]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[12]  Byoung-Tak Zhang,et al.  Self-Assembling Hypernetworks for Cognitive Learning of Linguistic Memory , 2008 .

[13]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[14]  J. Elman Connectionist models of cognitive development: where next? , 2005, Trends in Cognitive Sciences.