(Mis?-) Using DRT for Generation of Natural Language Text from Image Sequences

The abundance of geometric results from image sequence evaluation which is expected to shortly become available creates a new problem: how to present this material to a user without inundating him with unwanted details? A system design which attempts to cope not only with image sequence evaluation, but in addition with an increasing number of abstraction steps required for efficient presentation and inspection of results, appears to become necessary. The system-user interaction of a Computer Vision system should thus be designed as a natural language dialogue, assigned within the overall system at what we call the ‘Natural Language Level’. Such a decision requires to construct a series of abstraction steps from geometric evaluation results to natural language text describing the contents of an image sequence. We suggest to use Discourse Representation Theory as developed by [14] in order to design the system-internal representation of knowledge and results at the Natural Language Level. A first implementation of this approach and results obtained applying it to image sequences recorded from real world traffic scenes are described.

[1]  Francine Chen,et al.  Document image summarization without OCR , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[2]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[3]  Hans-Hellmut Nagel,et al.  Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[4]  Hans-Hellmut Nagel,et al.  Beginning a Transition from a Local to a More Global Point of View in Model-Based Vehicle Tracking , 1998, ECCV.

[5]  Shaogang Gong,et al.  Visual Surveillance in a Dynamic and Uncertain World , 1995, Artif. Intell..

[6]  Thomas Rist,et al.  On the Simultaneous Interpretation of Real World Image Sequences and their Natural Language Description: The System Soccer , 1988, ECAI.

[7]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Bernd Neumann,et al.  NOAS: Ein System zur natürlichsprachlichen Beschreibung zeitveränderlicher Szenen , 1986, Inform. Forsch. Entwickl..

[9]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing , 1997, IJCAI.

[10]  Hans-Hellmut Nagel,et al.  Berechnung natürlichsprachlicher Beschreibungen von Straßenverkehrsszenen aus Bildfolgen unter Verwendung von Geschehens- und Verdeckungsmodellierungen , 1996, DAGM-Symposium.

[11]  Zhi-Qiang Liu,et al.  Picture Interpretation: A Symbolic Approach , 1995, Series in Machine Perception and Artificial Intelligence.

[12]  J Starren,et al.  Description generation of abnormal densities found in radiographs. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[13]  SANDY DANCE,et al.  A concurrent, hierarchical approach to symbolic dynamic scene interpretation , 1996, Pattern Recognit..

[14]  Jitendra Malik,et al.  Automatic Symbolic Traffic Scene Analysis Using Belief Networks , 1994, AAAI.

[15]  Gerd Herzog,et al.  VIsual TRAnslator: Linking perceptions and natural language descriptions , 1994, Artificial Intelligence Review.

[16]  Hans-Hellmut Nagel,et al.  Integration of Image Sequence Evaluation and Fuzzy Metric Temporal Logic Programming , 1997, KI.

[17]  Hans-Hellmut Nagel,et al.  Ermittlung von begrifflichen Beschreibungen von Geschehen in Straßenverkehrsszenen mit Hilfe unscharfer Mengen , 1993, Informatik - Forschung und Entwicklung.

[18]  Tomek Strzalkowski,et al.  From Discourse to Logic , 1991 .

[19]  Hans-Hellmut Nagel,et al.  3D Pose Estimation by Directly Matching Polyhedral Models to Gray Value Gradients , 1997, International Journal of Computer Vision.