Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language

We present a novel framework for learning to interpret and generate language using only perceptual context as supervision. We demonstrate its capabilities by developing a system that learns to sportscast simulated robot soccer games in both English and Korean without any language-specific prior knowledge. Training employs only ambiguous supervision consisting of a stream of descriptive textual comments and a sequence of events extracted from the simulation trace. The system simultaneously establishes correspondences between individual comments and the events that they describe while building a translation model that supports both parsing and generation. We also present a novel algorithm for learning which events are worth describing. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans for our limited domain.

[1]  B. L. Whorf Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf , 1956 .

[2]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Stuart M. Shieber,et al.  A Uniform Architecture for Parsing and Generation , 1988, COLING.

[5]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  F. Quimby What's in a picture? , 1993, Laboratory animal science.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[10]  Debra T. Burhans,et al.  Visual Semantics: Extracting Visual information from Text Accompanying Pictures , 1994, AAAI.

[11]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[12]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[13]  J. Siskind A computational study of cross-situational techniques for learning word-to-meaning mappings , 1996, Cognition.

[14]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing , 1997, IJCAI.

[15]  Jerome A. Feldman,et al.  Modeling Embodied Lexical Development , 1997 .

[16]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[17]  Mitchell P. Marcus,et al.  Adding Semantic Annotation to the Penn TreeBank , 1998 .

[18]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[19]  Thomas Rist,et al.  Three RoboCup Simulation League Commentator Systems , 2000, AI Mag..

[20]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[21]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[22]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[23]  Regina Barzilay,et al.  Bootstrapping Lexical Choice via Multiple-Sequence Alignment , 2002, EMNLP.

[24]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[25]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[26]  Martha Palmer,et al.  Adding predicate argument structure to the Penn TreeBank , 2002 .

[27]  K. McKeown,et al.  Discourse Strategies for Generating Natural-Language Text , 1985, Artif. Intell..

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[30]  Kathleen McKeown,et al.  Statistical Acquisition of Content Selection Rules for Natural Language Generation , 2003, EMNLP.

[31]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[32]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[33]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[34]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[35]  Gerd Herzog,et al.  VIsual TRAnslator: Linking perceptions and natural language descriptions , 1994, Artificial Intelligence Review.

[36]  Chen Yu,et al.  On the Integration of Grounding Language and Learning Objects , 2004, AAAI.

[37]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[38]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[39]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[40]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[41]  Raymond J. Mooney,et al.  A Statistical Semantic Parser that Integrates Syntax and Semantics , 2005, CoNLL.

[42]  Hugo Zaragoza,et al.  Learning What to Talk about in Descriptive Games , 2005, HLT/EMNLP.

[43]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[44]  Mirella Lapata,et al.  Collective Content Selection for Concept-to-Text Generation , 2005, HLT.

[45]  Deb Roy,et al.  Speaking with your Sidekick: Understanding Situated Speech in Computer Role Playing Games , 2005, AIIDE.

[46]  Rohit J. Kate,et al.  Using String-Kernels for Learning Semantic Parsers , 2006, ACL.

[47]  Raymond J. Mooney,et al.  Learning for Semantic Parsing with Statistical Machine Translation , 2006, NAACL.

[48]  S. Harnad Symbol grounding problem , 1991, Scholarpedia.

[49]  Raymond J. Mooney,et al.  Learning for Semantic Parsing , 2009, CICLing.

[50]  Deb Roy,et al.  Situated Models of Meaning for Sports Video Retrieval , 2007, NAACL.

[51]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[52]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[53]  Regina Barzilay,et al.  Database-Text Alignment via Structured Multilabel Classification , 2007, IJCAI.

[54]  Brian Scassellati,et al.  A Robot That Uses Existing Vocabulary to Infer Non-Visual Word Meanings from Observation , 2007, AAAI.

[55]  Rohit J. Kate,et al.  Learning Language Semantics from Ambiguous Supervision , 2007, AAAI.

[56]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[57]  Sebastian Riedel,et al.  Spoken Language Understanding in dialogue systems, using a 2-layer Markov Logic Network: improving semantic accuracy , 2008 .

[58]  Wil M. P. van der Aalst,et al.  Analyzing Multi-agent Activity Logs Using Process Mining Techniques , 2008, DARS.

[59]  Hwee Tou Ng,et al.  A Generative Model for Parsing Natural Language to Meaning Representations , 2008, EMNLP.

[60]  Paul R. Cohen,et al.  Learning and Playing in Wubble World , 2008, AIIDE.

[61]  Jeffrey Nichols,et al.  Interpreting Written How-To Instructions , 2009, IJCAI.

[62]  Raymond J. Mooney,et al.  Learning a Compositional Semantic Parser using an Existing Syntactic Parser , 2009, ACL.

[63]  Raymond J. Mooney,et al.  Using closed captions to train activity recognizers that improve video retrieval , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[64]  Milica Gasic,et al.  Transformation-based learning for semantic parsing , 2009, INTERSPEECH.

[65]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[66]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[67]  L. ChenDavid,et al.  Training a multilingual sportscaster , 2010 .