Web-Scale Multimedia Information Networks This paper proposes a unified structured representation called MINets to combine information in the multimedia content network, cross-media links, and the associated ontological network.

The abundance of multimedia data on the Web presents both challenges (how to annotate, search, and mine) and opportunities (crawling the Web to create large structured multimedia data bases which can be used to do inference effectively). Because of the huge data volume, considering all semantic concepts as on the same (flat) level is not viable. In this paper, we introduce a unified STRUCTURED representation called multimedia information networks (MINets), which in- corporates ontology and cross-media links, covering both content and context knowledge. Ontology and cross-media structures are constructed and expanded by automatically constructing MINets from web-scale data by state-of-the-art information extraction and knowledge-based population tech- niques. The resultant MINet will contain a wide range of linkages, including logical, statistical, and semantic relations among informative concept nodes, which connects prolifera- tive ontology as well as cross-media web-scale resources together. The raw data collected in construction phase often contain much noisy, incomplete, or even conflicting informa- tion which could be detrimental to information extraction and utilization. Then, the redundant link structure can be utilized to distill MINets and improve quality of information (QoI). Moreover, advanced inference theory and system can be built upon the linked MINets, and then high-level ontological knowledge can be inferred and integrated in a logically harmonious network structure in MINets which is consistent with human cognition. Even more, as information channels, the ontology and cross-media links in MINets connect informative knowledge resources together, which makes it possible to increase the portability of information between different resources to increase information utilization levels.

[1]  Robert P. Cook,et al.  Freebase: A Shared Database of Structured General Human Knowledge , 2007, AAAI.

[2]  Zhen Li,et al.  Hierarchical Gaussianization for image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Jerry R. Hobbs,et al.  Learning by Reading: A Prototype System, Performance Baseline and Lessons Learned , 2007, AAAI.

[4]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[5]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[6]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[7]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Fabio Ciravegna,et al.  Exploring multimedia in a keyword space , 2008, ACM Multimedia.

[11]  Pavel Velikhov,et al.  Accuracy estimate and optimization techniques for SimRank computation , 2008, Proc. VLDB Endow..

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  Jiebo Luo,et al.  The wisdom of social multimedia: using flickr for prediction and forecast , 2010, ACM Multimedia.

[14]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[15]  Rudolf Kruse,et al.  Fusion: General concepts and characteristics , 2001, Int. J. Intell. Syst..

[16]  Takahiro Hara,et al.  Wikipedia Link Structure and Text Mining for Semantic Relation Extraction , 2008, SemSearch.

[17]  Avideh Zakhor,et al.  Efficient video similarity measurement and search , 2000 .

[18]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Kalina Bontcheva,et al.  Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project , 2004, Data Knowl. Eng..

[21]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Heng Ji,et al.  Cross-document Event Extraction and Tracking: Task, Evaluation, Techniques and Challenges , 2009, RANLP.

[24]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[25]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[27]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[28]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[29]  Flora Amato,et al.  Information Extraction from Multimedia Documents for e-Government Applications , 2009 .

[30]  Pavel Praks,et al.  Multimedia information extraction from HTML product catalogues , 2005, DATESO.

[31]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[32]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[33]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[34]  Jianping Fan,et al.  Hierarchical classification for automatic image annotation , 2007, SIGIR.

[35]  Charu C. Aggarwal,et al.  Towards cross-category knowledge propagation for learning visual concepts , 2011, CVPR 2011.

[36]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[37]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  Charu C. Aggarwal,et al.  Towards semantic knowledge propagation from text corpus to web images , 2011, WWW.

[39]  Udo Kruschwitz,et al.  Linguistic) Science Through Web Collaboration in the ANAWIKI project , 2009 .

[40]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Tao Mei,et al.  Multi-layer multi-instance kernel for video concept detection , 2007, ACM Multimedia.

[42]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[43]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[44]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[45]  Dan I. Moldovan,et al.  Exploiting ontologies for automatic image annotation , 2005, SIGIR '05.

[46]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[47]  Xiang Li,et al.  Top-Down and Bottom-Up: A Combined Approach to Slot Filling , 2010, AIRS.

[48]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[49]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[50]  Satoshi Nakamura,et al.  Statistical multimodal integration for audio-visual speech processing , 2002, IEEE Trans. Neural Networks.

[51]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Claudio Gentile,et al.  Regret Bounds for Hierarchical Classification with Linear-Threshold Functions , 2004, COLT.

[53]  Qi Tian,et al.  Visual ContextRank for web image re-ranking , 2009, LS-MMRM '09.

[54]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[55]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[56]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[57]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[58]  Jiebo Luo,et al.  Diversified Trajectory Pattern Ranking in Geo-tagged Social Media , 2011, SDM.

[59]  Yansong Feng,et al.  Automatic Image Annotation Using Auxiliary Text Information , 2008, ACL.

[60]  Yanmei Chai,et al.  OntoAlbum: An Ontology Based Digital Photo Management System , 2008, ICIAR.

[61]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Indranil Gupta,et al.  Delta-SimRank computing on MapReduce , 2012, BigMine '12.

[63]  Mohammad Rahmati,et al.  A novel multimedia data mining framework for information extraction of a soccer video stream , 2009, Intell. Data Anal..

[64]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[65]  Thomas S. Huang,et al.  Hierarchical image feature extraction and classification , 2010, ACM Multimedia.

[66]  Heng Ji,et al.  Exploring Context and Content Links in Social Media: A Latent Space Method , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[68]  Jiebo Luo,et al.  RankCompete: simultaneous ranking and clustering of web photos , 2010, WWW '10.

[69]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[70]  Marie-Francine Moens,et al.  Text Analysis for Automatic Image Annotation , 2007, ACL.

[71]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[72]  Éric Grégoire,et al.  An unbiased approach to iterated fusion by weakening , 2006, Inf. Fusion.

[73]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).