Mixing Statistical and Symbolic Approaches for Chemical Names Recognition

This paper investigates the problem of automatic chemical Term Recognition (TR) and proposes to tackle the problem by fusing Symbolic and statistical techniques. Unlike other solutions described in the literature, which only use complex and costly human made ruledbased matching algorithms, we show that the combination of a seven rules matching algorithm and a naive Bayes classifier achieves high performances. Through experiments performed on different kind of available Organic Chemistry texts, we show that our hybrid approach is also consistent across different data sets.