Complex Aggregates over Clusters of Elements

Complex aggregates have been proposed as a way to bridge the gap between approaches that handle sets by imposing conditions on specific elements, and approaches that handle them by imposing conditions on aggregated values. A complex aggregate summarises a subset of the elements in a set, where this subset is defined by conditions on the attribute values. In this paper, we present a new type of complex aggregate, where this subset is defined to be a cluster of the set. This is useful if subsets that are relevant for the task at hand are difficult to describe in terms of attribute conditions. This work is motivated from the analysis of flow cytometry data, where the sets are cells, and the subsets are cell populations. We describe two approaches to aggregate over clusters on an abstract level, and validate one of them empirically, motivating future research in this direction.

[1]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[2]  Paolo Frasconi,et al.  Feature Discovery with Type Extension Trees , 2008, ILP.

[3]  Arno J. Knobbe,et al.  Involving Aggregate Functions in Multi-relational Search , 2002, PKDD.

[4]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[5]  Martin Ester,et al.  A Method for Multi-relational Classification Using Single and Multi-feature Aggregation Functions , 2007, PKDD.

[6]  Stephen Muggleton,et al.  Inductive Logic Programming , 2011, Lecture Notes in Computer Science.

[7]  Maurice Bruynooghe,et al.  Aggregation versus selection bias, and relational neural networks , 2003 .

[8]  Foster J. Provost,et al.  Aggregation-based feature invention and relational concept classes , 2003, KDD '03.

[9]  Nicolas Lachiche,et al.  Incremental Construction of Complex Aggregates: Counting over a Secondary Table , 2013, ILP.

[10]  Arvind Gupta,et al.  Data reduction for spectral clustering to analyze high throughput flow cytometry data , 2010, BMC Bioinformatics.

[11]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[12]  Saso Dzeroski,et al.  First order random forests: Learning relational classifiers with complex aggregates , 2006, Machine Learning.

[13]  Leonore A Herzenberg,et al.  Interpreting flow cytometry data: a guide for the perplexed , 2006, Nature Immunology.

[14]  Paolo Frasconi,et al.  Type Extension Trees for feature construction and learning in relational domains , 2013, Artif. Intell..

[15]  Ryan R Brinkman,et al.  Rapid cell population identification in flow cytometry data , 2011, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[16]  Greg Finak,et al.  Merging Mixture Components for Cell Population Identification in Flow Cytometry , 2009, Adv. Bioinformatics.

[17]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[18]  Daphne Koller,et al.  Probabilistic Relational Models , 1999, ILP.

[19]  Stuart C. Sealfon,et al.  Misty Mountain Clustering: Application to Fast Unsupervised Flow Cytometry Gating , 2010 .

[20]  Celine Vens,et al.  Refining Aggregate Conditions in Relational Learning , 2006, PKDD.

[21]  Hendrik Blockeel,et al.  Classifying Relational Data with Neural Networks , 2005, ILP.