More Words and Bigger Pictures

Object recognition is a little like translation: a picture (text in a source language) goes in, and a description (text in a target language) comes out. I will use this analogy, which has proven fertile, to describe recent progress in object recognition. We have very good methods to spot some objects in images, but extending these methods to produce descriptions of images remains very difficult. The description might come in the form of a set of words, indicating objects, and boxes or regions spanned by the object. This representation is difficult to work with, because some objects seem to be much more important than others, and because objects interact. An alternative is a sentence or a paragraph describing the picture, and recent work indicates how one might generate rich structures like this. Furthermore, recent work suggests that it is easier and more effective to generate descriptions of images in terms of chunks of meaning (”person on a horse”) rather than just objects (”person”; ”horse”). Finally, if the picture contains objects that are unfamiliar, then we need to generate useful descriptions that will make it possible to interact with them, even though we don’t know what they are. About the Speaker David Forsyth is currently a full professor at U. Illinois at Urbana-Champaign, where he moved from U.C Berkeley, where he was also full professor. He has published over 130 papers on computer vision, computer graphics and machine learning. He has served as program chair and as general chair for various international conferences on computer vision. He received an IEEE technical achievement award for 2005 for his research and became an IEEE fellow in 2009. His textbook, ”Computer Vision: A Modern Approach” (joint with J. Ponce and published by Prentice Hall) is widely adopted as a course text. A second edition appeared in 2011. He was named editor in chief of IEEE TPAMI for a term starting in Jan 2013.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.