Noise Removal Framework for Market Basket Analysis

Rapid development in data analysis domain causes an ever growing demand for Market Basket Analysis. However, predefined methods in this domain emphasize on different techniques which concentrate to select appropriate items. In this paper, we tried to develop a framework for cleaning the dataset that depends on the proposition that “Better noise removal brings out better data analysis”. Eliminating noisy objects is an essential goal of data preprocessing as noise hampers data analysis. Data cleaning techniques which are recently developed concentrates on noise removals that are the consequences of low-level data errors. It causes due to defective data gathering process, but data objects that are clearly connected or related only at some particular time or unrelated/unimportant can also be significantly interfere with data analysis. Thus, in order to improve the data analysis to a greater extent, noisy data with respect to the underlying analysis must be removed at data preprocessing which is one of the steps of Knowledge Discovery in Databases (KDD). Hence to remove all types of noise, there is a need of data cleaning strategies. Because data sets can contain enormous measures of noise, these methods also need to be able to remove extensive portion of the data. To augment data analysis in existence of high noise intensity, this paper find method meant for noise removal.

[1]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[2]  Hui Xiong,et al.  Enhancing data analysis with noise removal , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Helge Langseth,et al.  Effects of data cleansing on load prediction algorithms , 2013, 2013 IEEE Computational Intelligence Applications in Smart Grid (CIASG).

[4]  Shun Long,et al.  Data extraction and cleansing of semi-structured Chinese texts , 2011, 2011 International Conference on Business Management and Electronic Information.

[5]  Joseph M. Hellerstein,et al.  Potter''s Wheel: An Interactive Framework for Data Transformation and Cleaning , 2001, VLDB 2001.

[6]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[7]  Haixun Wang,et al.  A Bayesian Inference-Based Framework for RFID Data Cleansing , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[9]  Hasimah Hj Mohamed,et al.  E-Clean: A Data Cleaning Framework for Patient Data , 2011, 2011 First International Conference on Informatics and Computational Intelligence.

[10]  Gonzalo Mateos,et al.  Load Curve Data Cleansing and Imputation Via Sparsity and Low Rank , 2013, IEEE Transactions on Smart Grid.

[11]  L. Venkata Subramaniam,et al.  Data Cleansing Techniques for Large Enterprise Datasets , 2011, 2011 Annual SRII Global Conference.

[12]  Jason Cong,et al.  System Light-Loading Technology for mHealth: Manifold-Learning-Based Medical Data Cleansing and Clinical Trials in WE-CARE Project , 2014, IEEE Journal of Biomedical and Health Informatics.

[13]  Max Plauth,et al.  Automated Data Augmentation Services Using Text Mining, Data Cleansing and Web Crawling Techniques , 2008, 2008 IEEE Congress on Services - Part I.