A Hybrid Approach to Identifying Objects from Verbal Descriptions

In this paper we present a hybrid approach for the identification of objects in a scene from spoken verbal descriptions. An integrated knowledge base realized by a semantic network is used to build conceptual descriptions of a scene observed by cameras as well as of an utterance possibly referring to one or more objects in that scene. The object identification problem is solved using Bayesian networks. The matrices for the nodes and links of the Bayesian networks are estimated from the results of psycholinguistic experiments with naive users.

[1]  Heinrich Niemann,et al.  Control and explanation in a signal understanding environment , 1993, Signal Process..

[2]  H. Nakatani An Image Retrieval System that Accepts Natural Language , 1994, AAAI 1994.

[3]  Hans-Hellmut Nagel,et al.  A vision of ‘vision and language’ comprises action: An example from road traffic , 2004, Artificial Intelligence Review.

[4]  J. Davenport Editor , 1960 .

[5]  Hans-Hellmut Nagel,et al.  From image sequences towards conceptual descriptions , 1988, Image Vis. Comput..

[6]  Franz Kummert,et al.  Semantic Models and Object Recognition in Computer Vision , 1996 .

[7]  Gerhard Sagerer,et al.  A three-dimensional spatial model for the interpretation of image data , 1998, IJCAI 1995.

[8]  Ernst Günter Schukat-Talamazzini Statistische Spracherkennung , 1995, Künstliche Intell..

[9]  Hans-Hellmut Nagel,et al.  Ermittlung von begrifflichen Beschreibungen von Geschehen in Straßenverkehrsszenen mit Hilfe unscharfer Mengen , 1993, Informatik - Forschung und Entwicklung.

[10]  Franz Kummert,et al.  Semantic hidden Markov networks , 1992, ICSLP.

[11]  Stig K. Andersen,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[12]  Hans-Hellmut Nagel,et al.  A vision of vision and language' comprises action: an example from road traffic , 1994 .

[13]  Wolfgang Wahlster,et al.  Text and images , 1997 .

[14]  Franz Kummert,et al.  Generation of language models using the results of image analysis , 1995, EUROSPEECH.

[15]  Wolfgang Wahlster,et al.  VITRA: Verbalisierung visueller Information , 1996, Informatik Forschung und Entwicklung.

[16]  Gernot A. Fink Integration von Spracherkennung und Sprachverstehen , 1995, DISKI.

[17]  Stefan Posch,et al.  3-D Reconstruction and Camera Calibration from Images with Known Objects , 1995, BMVC.