Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval

Welcome to the International Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval (VLS-MCMR'10). The purpose of this workshop is to bring together researchers interested in the construction and analysis of Very Large Scale Multimedia Corpus, as well as the methodologies to Mine and Retrieve information from them. The Workshop will provide a forum to consolidate key issues related to research on very large scale multimedia dataset such as the construction of dataset, creation of ground truth, sharing and extension of such resources in terms of ground truth, features, algorithms and tools etc. The Workshop will discuss and formulate action plan towards these goals. This workshop welcomes contributions on the following topics: Construction, Unification and Evolution of Corpus Framework for sharing of dataset, ground truth, features, algorithms and tools Indexing and retrieval for large multimedia collections (including images, video, audio and other multi-modal systems) Large-scale video event and temporal analysis over diverse sources Automatic machine tagging, semantic annotation and object recognition on massive multimedia collections Interfaces for exploring, browsing and visualizing large multimedia collections Scalable and distributed machine learning and data mining methods for multimedia data Performance evaluation methodologies and standards Large-scale copy detection and near-duplicate detection Web-scale combined analysis of social and content networks Scalable and distributed systems for multimedia content analysis Large-Scale multimedia applications are among the potential topics for the ACM multimedia 2010 hosts the "Multimedia Grand Challenge." The availability of Large-Scale Corpus would effectively boost research in this direction and foster many new applications for the years to come. The call for papers attracted 26 submissions from Asia, North-America, Europe and Africa. The program committee accepted 10 high quality papers. In addition, the program includes a panel on the topics addressed by the workshop and a keynote speech. We hope that the proceedings will serve as a valuable reference for multimedia researchers and developers as well as encourage new research direction and results. Looking over the papers accepted for the workshop, we observe three major trends. First there are approaches that attempt to benefit from the user-contributed data in order to facilitate modeling, mining and retrieval. Second, there are studies that focus on the algorithmic issues related to the use of massively parallel computing facilities. Finally, there is work that addresses the scalability issues when going very-large-scale. Tong tackles large scale image annotation using user contributed annotation (tags etc.) provided from social media network; where scalability is achieved through the use of the GRID'5000 computation mresources. Zhou et al. identify relevant text terms from text blocks that surround the web images in order to improve the accuracy of web image annotation on a 5 million image dataset. Wang et al. propose a deep model-based and data-driven hybrid architecture for annotating images. It is shown that DMD can scale-up well, thanks to its sparse regularization and scalable supervised learning steps. Creating a corpus is both expensive and time consuming. Liu and Huet propose a technique to automatically augment the training set for concept detector refinement. Two kinds of information is used to select the training data, one is visual feature, where video shots with high confidence scores are selected, the other is tags, in which tags are used to filter out video shots not tagged with the concept. Wu et al. present an unsupervised fully automatic algorithm for detecting commercials in broadcast TV. Their solution is scalable and efficient for fast, large scale, unsupervised commercial detection. Gudmund et al. revisit a cluster pruning algorithm, considering factors such as CPU/IO cost and memory constraint for large-scale copy detection. The method shows interesting clustering and retrieval computational cost when scaling up. In Kosh, processing and optimizing strategies are presented along with a cost model for integrating a similarity-based image join in a multimedia database. Nagy et al. addresses the scalability issues for visual vocabulary based image annotation algorithms as new object categories are added. To this end, a hierarchical approach is proposed based on classspecific vocabulary and a scoring function. Fan et al. reported an extensive analysis of user behavior in online video streaming based on a large-scale trace database of online video access sessions. The study of the statistical characteristics of user behavior patterns shows that user behavior in a video access session is not only related to the content of the video, but also has strong correlation with the behaviors of previous access sessions. Wang and Merialdo propose an approach to boost the performance of video concept detection based on the Bag-of-Words through the assignment different weights to the visual words according to their informativeness for the detection of different concepts.