An Audio and Image-Based On-Demand Content Annotation Framework for Augmenting the Video Viewing Experience on Mobile Devices

The availability of annotated multimedia contents is a crucial requirement for a number of applications. In the context of education it could support the automatic summarization of recorded lessons or the retrieval of learning material. In the field of entertainment, it could serve to recommend audio and video resources based on user's attitudes. In this work, a framework supporting video viewing experience augmentation on mobile devices by means of image- and text-based annotations extracted on-demand from Wikipedia is presented. Speech recognition is exploited to periodically get text snaps from the audio track of the video currently displayed on the mobile device, while query-by-images is used to generate a text summary of extracted video frames. Keywords obtained are treated by semantic techniques to find named entities associated with the multimedia contents, which are then superimposed to the video and displayed to the user in a synchronized way. Promising results obtained with a prototype implementation showed the feasibility of the proposed solution, which could be possibly combined with other systems, e.g., Providing information about user's location, preferences, etc. To build up more sophisticated context-aware applications.

[1]  Yang Li,et al.  A Novel Video Annotation Framework Based on Video Object , 2009, 2009 International Joint Conference on Artificial Intelligence.

[2]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Chong-Wah Ngo,et al.  Domain adaptive semantic diffusion for large scale context-based video annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[5]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[6]  Thomas S. Huang,et al.  Automatic Video Annotation by Mining Speech Transcripts , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[7]  Maarten de Rijke,et al.  Feeding the Second Screen: Semantic Linking based on Subtitles , 2013, DIR.

[8]  Ronald Azuma,et al.  Recent Advances in Augmented Reality , 2001, IEEE Computer Graphics and Applications.

[9]  Shuwu Zhang,et al.  A multi-modal video analysis system , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[10]  B. S. Manjunath,et al.  Automatic video annotation through search and mining , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[11]  Jason Pace The Ways We Play, Part 2: Mobile Game Changers , 2013, Computer.

[12]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[13]  B. S. Manjunath,et al.  Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.