Learning α-integration with partially-labeled data

Sensory data integration is an important task in human brain for multimodal processing as well as in machine learning for multisensor processing. α-integration was proposed by Amari as a principled way of blending multiple positive measures (e.g., stochastic models in the form of probability distributions), providing an optimal integration in the sense of minimizing the α-divergence. It also encompasses existing integration methods as its special case, e.g., weighted average and exponential mixture. In α-integration, the value of α determines the characteristics of the integration and the weight vector w assigns the degree of importance to each measure. In most of the existing work, however, α and w are given in advance rather than learned. In this paper we present two algorithms, for learning α and w from data when only a few integrated target values are available. Numerical experiments on synthetic as well as real-world data confirm the proposed method's effectiveness.

[1]  Andrzej Cichocki,et al.  Non-negative matrix factorization with alpha-divergence , 2008, Pattern Recognit. Lett..

[2]  Pascal Vasseur,et al.  Introduction to Multisensor Data Fusion , 2005, The Industrial Information Technology Handbook.

[3]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[4]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Yoonsuck Choe,et al.  Alpha-integration of multiple evidence , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[9]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[10]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[11]  Yoonsuck Choe,et al.  Manifold Integration with Markov Random Walks , 2008, AAAI.

[12]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[13]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[14]  A. Cichocki,et al.  Nonnegative matrix factorization with -divergence , 2008 .

[15]  Andrzej Cichocki,et al.  Nonnegative Tucker decomposition with alpha-divergence , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[17]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.