Outlier analysis is an important task in data science. Specifically finding outliers in categorical data is a tough task. To build an accurate Classifier, it is needed to eliminate exact number of outliers from the data. If less number of outliers is found, the obstacles will remain in the original data. An accurate classifier cannot be built on this data. Similarly if more number of outliers is found and eliminated, some original records may be missed. From this data too an accurate classifier cannot be built. So it is needed to eliminate the exact number of outliers while modeling a classifier. Since the data is categorical, in classification modeling, most infrequent records are treated as outliers. These infrequent objects disturb the data in modeling classifier. But how many outliers needed to be found is a problem. This paper presents the new approach normally distributed Outlier factor by infrequency (NOFI) to improve the Classifier accuracy. In modeling a classifier for categorical data, high frequent records are most useful and infrequent records are most useless. So the infrequent records are obstacles in modeling the classifier. There are many effective approaches to detect outliers for numerical data. But for categorical datasets there are few numbers of methods exists. The experiments are conducted for this new method has been applied on bank dataset which is taken from UCI ML Repository. This approach is not needed any input of k, the required number of outliers. NOFI would find number of outliers automatically using infrequency of all possible combinations framed from attribute values included in any record.
[1]
A. Govardhan,et al.
Outlier Analysis of Categorical Data using NAVF
,
2013
.
[2]
Nicolae Tapus,et al.
Tools for Empirical and Operational Analysis of Mobile Offloading in Loop-Based Applications
,
2013
.
[3]
Zengyou He,et al.
A Fast Greedy Algorithm for Outlier Mining
,
2005,
PAKDD.
[4]
B. Raveendra Babu,et al.
Outlier analysis of categorical data using FuzzyAVF
,
2013
.
[5]
Ramakrishnan Srikant,et al.
Fast Algorithms for Mining Association Rules in Large Databases
,
1994,
VLDB.
[6]
Michael Georgiopoulos,et al.
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
,
2010,
Data Mining and Knowledge Discovery.
[7]
Zengyou He,et al.
FP-outlier: Frequent pattern based outlier detection
,
2005,
Comput. Sci. Inf. Syst..