Learning Image-Text Associations

Web information fusion can be defined as the problem of collating and tracking information related to specific topics on the World Wide Web. Whereas most existing work on Web information fusion has focused on text-based multidocument summarization, this paper concerns the topic of image and text association, a cornerstone of cross-media Web information fusion. Specifically, we present two learning methods for discovering the underlying associations between images and texts based on small training data sets. The first method based on vague transformation measures the information similarity between the visual features and the textual features through a set of predefined domain-specific information categories. Another method uses a neural network to learn direct mapping between the visual and textual features by automatically and incrementally summarizing the associated features into a set of information templates. Despite their distinct approaches, our experimental results on a terrorist domain document set show that both methods are capable of learning associations between images and texts from a small training data set.

[1]  Lynda Hardman,et al.  Towards Ontology-Driven Discourse: From Semantic Graphs to Multimedia Presentations , 2003, SEMWEB.

[2]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[3]  Ali Mustafa,et al.  Creating agents for locating images of specific categories , 2003, IS&T/SPIE Electronic Imaging.

[4]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Mona Sharma,et al.  Performance Evaluation of Image Segmentation and Texture Extraction Methods in Scene Analysis , 2001 .

[7]  Jane Hunter,et al.  Dynamic Generation of Intelligent Multimedia Presentations through Semantic Inferencing , 2002, ECDL.

[8]  Hong Heather Yu,et al.  Scenic classification methods for image and video databases , 1995, Other Conferences.

[9]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Qiang Yang,et al.  Noise reduction through summarization for Web-page classification , 2007, Inf. Process. Manag..

[11]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[12]  Tao Jiang,et al.  Discovering Image-Text Associations for Cross-Media Web Information Fusion , 2006, PKDD.

[13]  John Robertson,et al.  Hypermedia Authoring , 1995, IEEE Multim..

[14]  Ah-Hwee Tan,et al.  FOCI: A Personalized Web Intelligence System , 2001 .

[15]  Stephen Grossberg,et al.  Intelligence Through Interaction: Towards a Unified Theory for Learning , 2007, ISNN.

[16]  Ah-Hwee Tan,et al.  Towards personalised web intelligence , 2004, Knowledge and Information Systems.

[17]  Shih-Fu Chang,et al.  Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Chabane Djeraba,et al.  Association and Content-Based Retrieval , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Kok-Leong Ong,et al.  Visual Terrain Analysis of High-Dimensional Datasets , 2005, PKDD.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Thomas Mandl Vague Transformations in Information Retrieval , 1998, ISI.

[22]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[23]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[24]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[25]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[26]  B. S. Manjunath,et al.  Mining Image Datasets Using Perceptual Association Rules , 2003 .

[27]  Hoda Akbari,et al.  Fuzzy Adaptive Resonance Theory for Content-Based Data Retrieval , 2006, 2006 Innovations in Information Technology.

[28]  Carlo Zaniolo,et al.  An adaptive learning approach for noisy data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[29]  S. Grossberg,et al.  ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.

[30]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[31]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[33]  Ah-Hwee Tan,et al.  Adaptive resonance associative map , 1995, Neural Networks.

[34]  Shih-Fu Chang,et al.  Video search reranking via information bottleneck principle , 2006, MM '06.

[35]  Eric Wai Ming Lee,et al.  Application of a noisy data classification technique to determine the occurrence of flashover in compartment fires , 2006, Adv. Eng. Informatics.

[36]  Matthias Blume,et al.  Image annotation based on learning vector quantization and localized Haar wavelet transform features , 1997, Defense, Security, and Sensing.

[37]  Ishwar K. Sethi,et al.  Mining association rules between low-level image features and high-level concepts , 2001, SPIE Defense + Commercial Sensing.

[38]  David E. Millard,et al.  Automatic Ontology-Based Knowledge Extraction from Web Documents , 2003, IEEE Intell. Syst..

[39]  Ankur Teredesai,et al.  CoMMA: a framework for integrated multimedia mining using multi-relational associations , 2005, Knowledge and Information Systems.

[40]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[41]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[42]  Ah-Hwee Tan,et al.  On Machine Learning Methods for Chinese Document Categorization , 2003, Applied Intelligence.

[43]  Stephen Grossberg,et al.  ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition , 1991, Neural Networks.

[44]  Gerald Sommer,et al.  Pattern Recognition by Self-Organizing Neural Networks , 1994 .

[45]  Qiang Ding,et al.  Association Rule Mining on Remotely Sensed Images Using P-trees , 2002, PAKDD.

[46]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[47]  Dragomir R. Radev A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure , 2000, SIGDIAL Workshop.

[48]  Pinar Duygulu Sahin,et al.  Comparison of Feature Sets Using Multimedia Translation , 2003, ISCIS.

[49]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[50]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[51]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..