Invited Talks

David Forsyth
University of Illinois at Urbana-Champaign

Biography: I am currently a full professor at U. Illinois at Urbana-Champaign, where I recently moved from U.C Berkeley, where I was also full professor. I have published over 130 papers on computer vision, computer graphics and machine learning. I have served as program co- chair for IEEE Computer Vision and Pattern Recognition in 2000, general co-chair for CVPR 2006, and program co-chair for the European Conference on Computer Vision 2008, and am a regular member of the program committee of all major international conferences on computer vision. I have served four years on the SIGGRAPH program committee, and am a regular reviewer for that conference. I have received best paper awards at the International Conference on Computer Vision and at the European Conference on Computer Vision. I received an IEEE technical achievement award for 2005 for my research and became an IEEE fellow in 2009. My recent textbook, "Computer Vision: A Modern Approach" (joint with J. Ponce and published by Prentice Hall) is now widely adopted as a course text (adoptions include MIT, U. Wisconsin-Madison, UIUC, Georgia Tech and U.C. Berkeley).

More Words and Bigger Pictures

Object recognition is a little like translation: a picture (text in a source language) goes in, and a description (text in a target language) comes out. I will use this analogy, which has proven fertile, to describe recent progress in object recognition.
We have very good methods to spot some objects in images, but extending these methods to produce descriptions of images remains very difficult. The description might come in the form of a set of words, indicating objects, and boxes or regions spanned by the object. This representation is difficult to work with, because some objects seem to be much more important than others, and because objects interact. An alternative is a sentence or a paragraph describing the picture, and recent work indicates how one might generate rich structures like this. Furthermore, recent work suggests that it is easier and more effective to generate descriptions of images in terms of chunks of meaning ("person on a horse") rather than just objects ("person"; "horse").
Finally, if the picture contains objects that are unfamiliar, then we need to generate useful descriptions that will make it possible to interact with them, even though we don't know what they are.