Content based multimedia retrieval: lessons learned from two decades of research

In the past two decades, we have witnessed bourgeoning research on content based multimedia information retrieval, covering a wide range of topics such as feature extraction, content matching, structure parsing, semantic annotation, multimodal analysis, 3D content retrieval, and user-in-the-loop interaction. More than ten years have also passed since the publication of the influential survey paper by Smeulders et al on content based image retrieval. Recently, exciting solutions are emerging in several practical contexts such as mobile media search, augmented reality, and Web-scale copy detection. However, many fundamental problems remain open, including but not limited to large-scale semantic annotation, multimedia ontological organization, and human-machine interaction for searching complex events. In this talk, I will discuss lessons learned from our past research, drawing from successes and failures in developing and deploying a few image/video search systems in different domains, and then share views about promising future directions.