Graph based multi-modality learning

To better understand the content of multimedia, a lot of research efforts have been made on how to learn from multi-modal feature. In this paper, it is studied from a graph point of view: each kind of feature from one modality is represented as one independent graph; and the learning task is formulated as inferring from the constraints in every graph as well as supervision information (if available). For semi-supervised learning, two different fusion schemes, namely linear form and sequential form, are proposed. For each scheme, it is derived from optimization point of view; and further justified from two sides: similarity propagation and Bayesian interpretation. By doing so, we reveal the regular optimization nature, transductive learning nature as well as prior fusion nature of the proposed schemes, respectively. Moreover, the proposed method can be easily extended to unsupervised learning, including clustering and embedding. Systematic experimental results validate the effectiveness of the proposed method.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Hongjun Lu,et al.  ReCoM: reinforcement clustering of multi-type interrelated data objects , 2003, SIGIR.

[3]  Changshui Zhang,et al.  Multi-view EM algorithm and its application to color image segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[4]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[5]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[6]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[7]  Martin Heckmann,et al.  Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[8]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Tat-Seng Chua,et al.  A bootstrapping framework for annotating and retrieving WWW images , 2004, MULTIMEDIA '04.

[10]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[11]  Rayid Ghani,et al.  Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.

[12]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Bernhard Schölkopf,et al.  Transductive Inference with Graphs , 2004 .

[14]  Hans-Peter Kriegel,et al.  Clustering Multi-represented Objects with Noise , 2004, PAKDD.

[15]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[16]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[17]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[18]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Wei-Ying Ma,et al.  Locality preserving clustering for image database , 2004, MULTIMEDIA '04.

[21]  Josef Kittler,et al.  Combining classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[22]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[23]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[24]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[25]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[26]  B. Schölkopf,et al.  A Regularization Framework for Learning from Graph Data , 2004, ICML 2004.

[27]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[29]  Chalapathy Neti,et al.  Frame-dependent multi-stream reliability indicators for audio-visual speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[30]  Ching Y. Suen,et al.  Multiple Classifier Combination Methodologies for Different Output Levels , 2000, Multiple Classifier Systems.