A Novel Integrated Classifier for Handling Data Warehouse Anomalies

Within databases employed in various commercial sectors, anomalies continue to persist and hinder the overall integrity of data. Typically, Duplicate, Wrong and Missed observations of spatial-temporal data causes the user to be not able to accurately utilise recorded information. In literature, different methods have been mentioned to clean data which fall into the category of either deterministic and probabilistic approaches. However, we believe that to ensure the maximum integrity, a data cleaning methodology must have properties of both of these categories to effectively eliminate the anomalies. To realise this, we have proposed a method which relies both on integrated deterministic and probabilistic classifiers using fusion techniques. We have empirically evaluated the proposed concept with state-of-the-art techniques and found that our approach improves the integrity of the resulting data set.

[1]  Sudarshan S. Chawathe,et al.  Managing RFID Data , 2004, VLDB.

[2]  Brijesh Verma,et al.  A neural based segmentation and recognition technique for handwritten words , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[3]  Dan Suciu,et al.  Probabilistic Event Extraction from RFID Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Jun Rao,et al.  A deferred cleansing method for RFID data analytics , 2006, VLDB.

[5]  Qiang Yang,et al.  Quantifying information and contradiction in propositional logic through test actions , 2009, IJCAI.

[6]  Abdul Sattar,et al.  Correcting Missing Data Anomalies with Clausal Defeasible Logic , 2010, ADBIS.

[7]  Frank Wolter,et al.  Semi-qualitative Reasoning about Distances: A Preliminary Report , 2000, JELIA.

[8]  Aikaterini Mitrokotsa,et al.  Detecting intrusions within RFID systems through non-monotonic reasoning cleaning , 2010, 2010 Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[9]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[10]  M. Scanu,et al.  Bayesian networks for imputation , 2004 .

[11]  Bela Stantic,et al.  Correcting Stored RFID Data with Non-Monotonic Reasoning , 2007 .

[12]  David Billington Propositional Clausal Defeasible Logic , 2008, JELIA.