Social media mining and search

Recent years have witnessed the proliferation of social media and the success of many social websites, including Flickr, Youtube, Facebook, Twitter, etc., which drastically increased the volume of community-shared media resources, including images and videos. These websites allow users not only to create and share media data but also to rate and annotate them. Thus lots of meta-data, such as user-provided tags, comments, geo-tags, capture time and EXIF information, associated to multimedia resources, are available in the social media websites. On the one hand, the rapid increase of social media data makes many related applications challenging, such as categorization, recommendation and search. On the other hand, the rich information clues associated with the data also offer us opportunities to attack many well-recognized difficulties encountered in multimedia analysis and understanding, e.g., insufficiency of labeled data for semantic learning. The multimedia research community has widely recognized the importance of learning effective models for understanding, organization, and access but has failed to make rapid progress due to the insufficiency of labeled data, which typically comes from users in an interactive labor-intensive manual process. In order to reduce this manual effort, many semi-supervised learning or active learning approaches have been proposed. Nevertheless, there is still a need to manually annotate a large set of images or videos to bootstrap and steer the training. The rich information clues associated with the multimedia data in the social media websites offer a way out. If we can learn the models for semantic concepts effectively from user-shared data by using their associated meta-text as training labels, or if we can infer the semantic concepts of the multimedia data directly from the data in the Multimed Tools Appl (2012) 56:1–7 DOI 10.1007/s11042-011-0822-1