Learning Language from Perceptual Context

Current systems that learn to process natural language require laboriously constructed human-annotated training data. Ideally, a computer would be able to acquire language like a child by being exposed to linguistic input in the context of a relevant but ambiguous perceptual environment. As a step in this direction, we present a system that learns to sportscast simulated robot soccer games by example. The training data consists of textual human commentaries on Robocup simulation games. A set of possible alternative meanings for each comment is automatically constructed from game event traces. Our previously developed systems for learning to parse and generate natural language (KRISP and WASP) were augmented to learn from this data and then commentate novel games. The system is evaluated based on its ability to parse sentences into correct meanings and generate accurate descriptions of game events. Human evaluation was also conducted on the overall quality of the generated sportscasts and compared to human-generated commentaries.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[3]  Raymond J. Mooney,et al.  Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language , 2014, J. Artif. Intell. Res..

[4]  Johanna D. Moore,et al.  Report on the First NLG Challenge on Generating Instructions in Virtual Environments (GIVE) , 2009, ENLG.

[5]  Deb Roy,et al.  Intentional Context in Situated Natural Language Learning , 2005, CoNLL.

[6]  Raymond J. Mooney,et al.  Using closed captions to train activity recognizers that improve video retrieval , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Jeffrey Nichols,et al.  Interpreting Written How-To Instructions , 2009, IJCAI.

[8]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[9]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[10]  Chen Yu,et al.  On the Integration of Grounding Language and Learning Objects , 2004, AAAI.

[11]  Stuart M. Shieber,et al.  A Uniform Architecture for Parsing and Generation , 1988, COLING.

[12]  B. L. Whorf Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf , 1956 .

[13]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing , 1997, IJCAI.

[14]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[15]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[16]  Rohit J. Kate,et al.  Using String-Kernels for Learning Semantic Parsers , 2006, ACL.

[17]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[18]  Wil M. P. van der Aalst,et al.  Analyzing Multi-agent Activity Logs Using Process Mining Techniques , 2008, DARS.

[19]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[20]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[21]  Raymond J. Mooney,et al.  A Statistical Semantic Parser that Integrates Syntax and Semantics , 2005, CoNLL.

[22]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[23]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[24]  Regina Barzilay,et al.  Database-Text Alignment via Structured Multilabel Classification , 2007, IJCAI.

[25]  Hugo Zaragoza,et al.  Learning What to Talk about in Descriptive Games , 2005, HLT/EMNLP.

[26]  Eric Brill,et al.  An Overview of Empirical Natural Language Processing , 1997, AI Mag..

[27]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[28]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[29]  Henrik I. Christensen,et al.  Situated Dialogue and Spatial Organization: What, Where… and Why? , 2007 .

[30]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[31]  Gerd Herzog,et al.  VIsual TRAnslator: Linking perceptions and natural language descriptions , 1994, Artificial Intelligence Review.

[32]  Deb Roy,et al.  Situated Models of Meaning for Sports Video Retrieval , 2007, NAACL.

[33]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[34]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[35]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[36]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[37]  Jerome A. Feldman,et al.  Modeling Embodied Lexical Development , 1997 .

[38]  F. Quimby What's in a picture? , 1993, Laboratory animal science.

[39]  Martha Palmer,et al.  Adding predicate argument structure to the Penn TreeBank , 2002 .

[40]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[41]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[42]  Raymond J. Mooney,et al.  Learning for Semantic Parsing , 2009, CICLing.

[43]  Debra T. Burhans,et al.  Visual Semantics: Extracting Visual information from Text Accompanying Pictures , 1994, AAAI.

[44]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[45]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[46]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[47]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[48]  Mirella Lapata,et al.  Collective Content Selection for Concept-to-Text Generation , 2005, HLT.

[49]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[50]  Deb Roy,et al.  Speaking with your Sidekick: Understanding Situated Speech in Computer Role Playing Games , 2005, AIIDE.

[51]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[52]  J. Siskind A computational study of cross-situational techniques for learning word-to-meaning mappings , 1996, Cognition.

[53]  Hwee Tou Ng,et al.  A Generative Model for Parsing Natural Language to Meaning Representations , 2008, EMNLP.

[54]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[55]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[56]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[57]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[58]  D. Roy Learning Visually Grounded Words and Syntax of Natural Spoken Language , 2000 .

[59]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[60]  G. Lakoff,et al.  Metaphors We Live By , 1980 .

[61]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[62]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[63]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[64]  Deb Roy,et al.  Semiotic schemas: A framework for grounding language in action and perception , 2005, Artif. Intell..

[65]  Robert Givan,et al.  Specific-to-General Learning for Temporal Events with Application to Learning Event Definitions from Video , 2002, J. Artif. Intell. Res..

[66]  Thomas Rist,et al.  Three RoboCup Simulation League Commentator Systems , 2000, AI Mag..

[67]  Raymond J. Mooney,et al.  Semantic Lexicon Acquisition for Learning Natural Language Interfaces , 1998, VLC@COLING/ACL.

[68]  Rohit J. Kate,et al.  Learning Language Semantics from Ambiguous Supervision , 2007, AAAI.

[69]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[70]  Raymond J. Mooney,et al.  Learning for Semantic Parsing with Statistical Machine Translation , 2006, NAACL.

[71]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[72]  Brian Scassellati,et al.  A Robot That Uses Existing Vocabulary to Infer Non-Visual Word Meanings from Observation , 2007, AAAI.

[73]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.