What happens where?

The explosion of geo-tagged images taken from mobile devices around the world is visually capturing life at amazingly high spatial-, temporal-, and semantic-density. In places like cities, which cover only 3% of the Earth's landmass, yet account for 50% of the world's population, the density of photos averages one photo per every 18 square meters per year. In densely populated cities like New York City, London, Paris, San Francisco and Tokyo the number reaches up to 40 photos. What this means is that computers are able to learn from photos not only what is where - in terms of fixed and persistent objects like landmarks and buildings, but what happens where - in terms of dynamic and transient activities and events. The key idea is to combine semantic concept modeling of photos with geo-spatio-temporal localization to perform {it geo-semantic modeling. In this talk, we describe how different semantic activities and events can be defined using a geo-semantic ontology and used to create geo-semantic probability maps of what happens where from photos. We describe how this provides a powerful capability to learn visually from historical geo-tagged photos things like: "where is a good place to go fishing," "when and where are the street fairs in a location," "where is good vantage spot for the yearly fireworks show," or even "where to meet people with particular fashion or style."

[1]  John R. Smith History Made Every Day , 2011, IEEE Multim..