SMELL AND LANGUAGE : DATACENTRIC APPROACH TO PREDICTING SMELL OF A MOLECULE

Predicting the smell of a molecule is an open question till date. We have been chemocentric in approach to understanding it, with the advent of data analysis technologies, it is possible to look it from a datacentric point of view. We have accumulated 3026 odoriferous molecules and their perceived smells from the large publically available dataset online. We developed a graphical method to analyze the smell or perceptual descriptors of the molecules using normalized co-occurrence value i.e. how many times the perceptual descriptors have occurred together. We developed a simple method to annotate the smell of a molecules in two groups using the community detection technique generally used in social network analysis. We then developed a machine learning method using physico-chemical properties of the molecules and predicted the smell of a molecule. We report MCC, accuracy, precision, recall and ROC values obtained from this procedure. The resulting ROC value of 0.7 indicate that such an analysis can be undertaken and are effective in predicting the smell of a molecule. The graphical method to analyze the perceptual descriptors space presents interesting insights regarding language and olfaction, e.g. clustering of typical words together and their unrelatedness to semantics.