Local Subgroup Discovery for Eliciting and Understanding New Structure-Odor Relationships

From a molecule to the brain perception, olfaction is a complex phenomenon that remains to be fully understood in neuroscience. A challenge is to establish comprehensive rules between the physicochemical properties of the molecules (e.g., weight, atom counts) and specific and small subsets of olfactory qualities (e.g., fruity, woody). This problem is particularly difficult as the current knowledge states that molecular properties only account for \(30\,\%\) of the identity of an odor: predictive models are found lacking in providing universal rules. However, descriptive approaches enable to elicit local hypotheses, validated by domain experts, to understand the olfactory percept. Based on a new quality measure tailored for multi-labeled data with skewed distributions, our approach extracts the top-k unredundant subgroups interpreted as descriptive rules \(description \rightarrow \{subset\ of\ labels\}\). Our experiments on benchmark and olfaction datasets demonstrate the capabilities of our approach with direct applications for the perfume and flavor industries.

[1]  Amedeo Napoli,et al.  Revisiting Numerical Pattern Mining with Formal Concept Analysis , 2011, IJCAI.

[2]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[3]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[4]  Gilles Sicard,et al.  Structure–odour relationships reviewed in the postgenomic era , 2015 .

[5]  Friedrich Mueller,et al.  Odor classification: a review of factors influencing perception-based odor arrangements. , 2013, Chemical senses.

[6]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[7]  S. Arctander,et al.  Perfume and Flavor Materials of Natural Origin , 1994 .

[8]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[9]  A. Knobbe,et al.  Supervised descriptive local pattern mining with complex target concepts , 2016 .

[10]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[11]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[12]  Amedeo Napoli,et al.  Biclustering Numerical Data in Formal Concept Analysis , 2011, ICFCA.

[13]  Jérôme Golebiowski,et al.  StructureOdor Relationships of Semisynthetic β‐Santalol Analogs , 2014, Chemistry & biodiversity.

[14]  R. Khan,et al.  Predicting Odor Pleasantness from Odorant Structure: Pleasantness as a Reflection of the Physical World , 2007, The Journal of Neuroscience.

[15]  Arvind Ramanathan,et al.  Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization , 2013, PloS one.

[16]  R. Axel,et al.  A novel multigene family may encode odorant receptors: A molecular basis for odor recognition , 1991, Cell.

[17]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[18]  Pauli Miettinen,et al.  From black and white to full color: extending redescription mining outside the Boolean world , 2012, Stat. Anal. Data Min..