A vague assisted association analysis approach to repair Bigdata impurities

Big data consist of huge amount of data. There are many of challenges when we accessing Big data. One of such major challenge is identified in terms of dataset impurities. These impurities are visible sometimes in terms of missing or incomplete information. But sometimes this kind of impurities is hidden in terms of non-valuable attribute or unnecessary information. In this paper, a two layered work is defined to improve the dataset integrity. In first phase, the analysis over the dataset is performed and later on the impurities are removed. Once the problems are identified, the particular attribute or the tuple are removed from the dataset. To verify the dataset integrity, the association rules are generated.

[1]  Payal Pahwa,et al.  Domain Dependent and Independent Data Cleansing Techniques , 2011 .

[2]  K. Bakshi,et al.  Considerations for big data: Architecture and approach , 2012, 2012 IEEE Aerospace Conference.

[3]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[4]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[5]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[6]  Aditya B. Patel,et al.  Addressing big data problem using Hadoop and Map Reduce , 2012, 2012 Nirma University International Conference on Engineering (NUiCONE).

[7]  Paolo Ceravolo,et al.  Consistent Process Mining over Big Data Triple Stores , 2013, 2013 IEEE International Congress on Big Data.

[8]  Zibin Zheng,et al.  Service-Generated Big Data and Big Data-as-a-Service: An Overview , 2013, 2013 IEEE International Congress on Big Data.

[9]  Dhiren R. Patel,et al.  Blocking Based Approach for Classification Rule Hiding to Preserve the Privacy in Database , 2011, 2011 International Symposium on Computer Science and Society.