Multiple-Instance Regression with Structured Data

We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bagpsilas internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.

[1]  J. Mustard,et al.  Wavelet analysis of MODIS time series to detect expansion and intensification of row-crop agriculture in Brazil , 2008 .

[2]  Sally A. Goldman,et al.  Multiple-Instance Learning of Real-Valued Data , 2001, J. Mach. Learn. Res..

[3]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[4]  Kiri L. Wagstaff,et al.  Salience Assignment for Multiple-Instance Regression , 2007 .

[5]  Murat Dundar,et al.  Multiple Instance Learning for Computer Aided Diagnosis , 2006, NIPS.

[6]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[7]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[8]  James T. Kwok,et al.  A regularization framework for multiple-instance learning , 2006, ICML.

[9]  Sally A. Goldman,et al.  Multiple-Instance Learning of Real-Valued Geometric Patterns , 2003, Annals of Mathematics and Artificial Intelligence.

[10]  Mark Craven,et al.  Learning from data with complex interactions and ambiguous labels , 2005 .

[11]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[12]  John F. Roddick,et al.  Temporal, Spatial, and Spatio-Temporal Data Mining , 2001, Lecture Notes in Computer Science.

[13]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[14]  Sally A. Goldman,et al.  MISSL: multiple-instance semi-supervised learning , 2006, ICML.

[15]  David Page,et al.  Multiple Instance Regression , 2001, ICML.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..