A Novel Technique to Find Outliers in Mixed Attribute Datasets
暂无分享,去创建一个
An Outlier is a data point which is significantly different from the remaining data points. Outlier is also referred as discordant, deviants and abnormalities. Outliers may have a particular interest, such as credit card fraud detection, where outliers indicate fraudulent activity. Thus, outlier detection analysis is an interesting data mining task, referred to as outlier analysis. Detecting outliers efficiently from dataset is an important task in many fields like Credit card Fraud, Medicine, Law enforcement, Earth Sciences etc. Many methods are available to identify outliers in numerical dataset. But there exist limited number of methods are available for categorical and mixed attribute datasets. In the proposed work, a novel outlier detection method is proposed. This proposed method finds anomalies based on each record’s “multi attribute outlier factor through correlation” score and it has great intuitive appeal. This algorithm utilizes the frequency of each value in categorical part of the dataset and correlation factor of each record with mean record of the entire dataset. This proposed method used Attribute Value Frequency score (AVF score) concept for categorical part. Results of the proposed method are compared with existing methods. The Bank data (Mixed) is used for experiments in this paper which is taken from UCI machine learning repository. Keyword: Outlier, Mixed Attribute Datasets, Attribute Value Frequency Score