Multifold Concept Relationships Metrics

How to establish the relationship between concepts based on the large scale real-world click data from commercial engine is a challenging topic due to that the click data suffers from the noise such as typos, the same concept with different queries etc. In this paper, we propose an approach for automatically establishing the concept relationship. We first define five specific relationships between concepts and leverage them to annotate the images collected from commercial search engine. We then extract some conceptual features in textual and visual domain to train the concept model. The relationship of each pairwise concept will thus be classified into one of the five special relationships. Experimental results demonstrate our proposed approach is more effective than Google Distance.

[1]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[2]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[4]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[6]  Bin Wang,et al.  Large-Scale Duplicate Detection for Web Image Search , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7]  Yong Cheng,et al.  A Semi-supervised Clustering Algorithm Based on Must-Link Set , 2008, ADMA.

[8]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Cherri M. Pancake,et al.  The promise and the cost of object technology: a five-year forecast , 1995, CACM.