Alternative Multiview Maximum Entropy Discrimination

Maximum entropy discrimination (MED) is a general framework for discriminative estimation based on maximum entropy and maximum margin principles, and can produce hard-margin support vector machines under some assumptions. Recently, the multiview version of MED multiview MED (MVMED) was proposed. In this paper, we try to explore a more natural MVMED framework by assuming two separate distributions p1(Θ1) over the first-view classifier parameter Θ1 and p2(Θ2) over the second-view classifier parameter Θ2. We name the new MVMED framework as alternative MVMED (AMVMED), which enforces the posteriors of two view margins to be equal. The proposed AMVMED is more flexible than the existing MVMED, because compared with MVMED, which optimizes one relative entropy, AMVMED assigns one relative entropy term to each of the two views, thus incorporating a tradeoff between the two views. We give the detailed solving procedure, which can be divided into two steps. The first step is solving our optimization problem without considering the equal margin posteriors from two views, and then, in the second step, we consider the equal posteriors. Experimental results on multiple real-world data sets verify the effectiveness of the AMVMED, and comparisons with MVMED are also reported.

[1]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[2]  Bo Zhang,et al.  Partially Observed Maximum Entropy Discrimination Markov Networks , 2008, NIPS.

[3]  Zhi-Hua Zhou,et al.  A New Analysis of Co-Training , 2010, ICML.

[4]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[5]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[6]  Shiliang Sun,et al.  Sparse Semi-supervised Learning Using Conjugate Functions , 2010, J. Mach. Learn. Res..

[7]  Tong Zhang,et al.  Two-view feature generation model for semi-supervised learning , 2007, ICML '07.

[8]  Gert R. G. Lanckriet,et al.  Learning Multi-modal Similarity , 2010, J. Mach. Learn. Res..

[9]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[10]  Pengtao Xie,et al.  Multi-Modal Distance Metric Learning , 2013, IJCAI.

[11]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[12]  Jun Zhu,et al.  Maximum Entropy Discrimination Markov Networks , 2009, J. Mach. Learn. Res..

[13]  Martha White,et al.  Convex Multi-view Subspace Learning , 2012, NIPS.

[14]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[15]  Tony Jebara,et al.  Multitask Sparsity via Maximum Entropy Discrimination , 2011, J. Mach. Learn. Res..

[16]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[17]  Yong Luo,et al.  Multiview Vector-Valued Manifold Regularization for Multilabel Image Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[19]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[20]  Philip M. Long,et al.  Mistake Bounds for Maximum Entropy Discrimination , 2004, NIPS.

[21]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[22]  Alexandros Iosifidis,et al.  View-Invariant Action Recognition Based on Artificial Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Peter L. Bartlett,et al.  The Rademacher Complexity of Co-Regularized Kernel Classes , 2007, AISTATS.

[24]  Shiliang Sun,et al.  Multi-View Maximum Entropy Discrimination , 2013, IJCAI.

[25]  Roger Fletcher,et al.  Practical methods of optimization; (2nd ed.) , 1987 .

[26]  Grigorios Tzortzis,et al.  Multiple View Clustering Using a Weighted Combination of Exemplar-Based Mixture Models , 2010, IEEE Transactions on Neural Networks.

[27]  R. Bharat Rao,et al.  Bayesian Co-Training , 2007, J. Mach. Learn. Res..

[28]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[29]  Bo Zhang,et al.  Laplace maximum margin Markov networks , 2008, ICML '08.

[30]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[31]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[32]  R. Fletcher Practical Methods of Optimization , 1988 .

[33]  Shiliang Sun,et al.  Robust Co-Training , 2011, Int. J. Pattern Recognit. Artif. Intell..

[34]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[35]  Hani Yehia,et al.  Speaking mode variability in multimodal speech production , 2002, IEEE Trans. Neural Networks.

[36]  Tommi S. Jaakkola,et al.  Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[37]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[38]  Shiliang Sun,et al.  Multi-view Laplacian Support Vector Machines , 2011, ADMA.

[39]  Vikas Sindhwani,et al.  An RKHS for multi-view learning and manifold co-regularization , 2008, ICML '08.