论文信息 - Multi-Instance Mixture Models and Semi-Supervised Learning

Multi-Instance Mixture Models and Semi-Supervised Learning

Multi-instance (MI) learning is a variant of supervised learning where labeled examples consist of bags (i.e. multi-sets) of feature vectors instead of just a single feature vector. Under standard assumptions, MI learning can be understood as a type of semisupervised learning (SSL). The difference between MI learning and SSL is that positive bag labels provide weak label information for the instances that they contain. MI learning tasks can be approximated as SSL tasks by disregarding this weak label information, allowing the direct application of existing SSL techniques. To give insight into this connection we first introduce multi-instance mixture models (MIMMs), an adaption of mixture model classifiers for multi-instance data. We show how to learn such models using an Expectation-Maximization algorithm in the case where the instance-level class distributions are members of an exponential family. The cost of the semi-supervised approximation to multiinstance learning is explored, both theoretically and empirically, by analyzing the properties of MIMMs relative to semi-supervised mixture models.

Padhraic Smyth | James Foulds

[1] Ian Witten,et al. Data Mining , 2000 .

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] Xin Xu,et al. Statistical Learning in Multiple Instance Problems , 2003 .

[4] Luc De Raedt,et al. Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract) , 1998, ILP.

[5] Bernhard Pfahringer,et al. A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.

[6] Xin Xu,et al. Logistic Regression and Boosting for Labeled Bags of Instances , 2004, PAKDD.

[7] D. Rubin. INFERENCE AND MISSING DATA , 1975 .

[8] Adrian E. Raftery,et al. Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[9] Qi Zhang,et al. EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[10] Hans-Peter Kriegel,et al. An EM-Approach for Clustering Multi-Instance Objects , 2006, PAKDD.

[11] Shuang-Hong Yang,et al. Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora , 2009, NIPS.