Fuzzy models and potential outliers

Outliers or distorted attributes very often severely interfere with data analysis algorithms that try to extract few meaningful rules. Most methods to deal with outliers try to completely ignore them. This can be potentially harmful since the very outlier that was ignored might have described a rare but still extremely interesting phenomena. We describe an approach that tries to build an interpretable model while still maintaining all the information in the data. This is achieved through a two stage process. A first phase builds an outlier model for data points of low relevance, followed by a second stage which uses this model as filter and generates a simpler model, describing only examples with higher relevance, thus representing a more general concept. The outlier model on the other hand may point out potential areas of interest to the user. Preliminary experiments using an existing algorithm to construct fuzzy rule sets from data indicate that the two models in fact have lower complexity and sometimes even offer superior performance.