Rule Mining and Classification in a Situation Assessment Application: A Belief-Theoretic Approach for Handling Data Imperfections

Management of data imprecision and uncertainty has become increasingly important, especially in situation awareness and assessment applications where reliability of the decision-making process is critical (e.g., in military battlefields). These applications require the following: 1) an effective methodology for modeling data imperfections and 2) procedures for enabling knowledge discovery and quantifying and propagating partial or incomplete knowledge throughout the decision-making process. In this paper, using a Dempster-Shafer belief-theoretic relational database (DS-DB) that can conveniently represent a wider class of data imperfections, an association rule mining (ARM)-based classification algorithm possessing the desirable functionality is proposed. For this purpose, various ARM-related notions are revisited so that they could be applied in the presence of data imperfections. A data structure called belief itemset tree is used to efficiently extract frequent itemsets and generate association rules from the proposed DS-DB. This set of rules is used as the basis on which an unknown data record, whose attributes are represented via belief functions, is classified. These algorithms are validated on a simplified situation assessment scenario where sensor observations may have caused data imperfections in both attribute values and class labels.

[1]  G. Klir,et al.  MEASURING TOTAL UNCERTAINTY IN DEMPSTER-SHAFER THEORY: A NOVEL APPROACH , 1994 .

[2]  Balaram Das Representing Uncertainties Using Bayesian Networks , 1999 .

[3]  Vijay V. Raghavan,et al.  Itemset Trees for Targeted Association Querying , 2003, IEEE Trans. Knowl. Data Eng..

[4]  J. Kacprzyk,et al.  Advances in the Dempster-Shafer theory of evidence , 1994 .

[5]  K. Premaratne,et al.  Rule mining and classification in imperfect databases , 2005, 2005 7th International Conference on Information Fusion.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[8]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[9]  Rangasami L. Kashyap,et al.  Generalized Affinity-Based Association Rule Mining for Multimedia Database Queries , 2001, Knowledge and Information Systems.

[10]  Kamal Premaratne,et al.  Evidence Combination in an Environment With Heterogeneous Sources , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[11]  George J. Klir,et al.  A Note on the Measure of Discord , 1992, UAI.

[12]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[13]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[14]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[15]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[16]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[17]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[18]  Philippe Smets,et al.  Practical Uses of Belief Functions , 1999, UAI.

[19]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[20]  David A. Bell,et al.  EDM: A General Framework for Data Mining Based on Evidence Theory , 1996, Data Knowl. Eng..

[21]  Ronald Fagin,et al.  A new approach to updating beliefs , 1990, UAI.

[22]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[23]  Jiebo Luo,et al.  Multilabel machine learning and its application to semantic scene classification , 2003, IS&T/SPIE Electronic Imaging.

[24]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[25]  Mei-Ling Shyu,et al.  Conditioning and updating evidence , 2004, Int. J. Approx. Reason..

[26]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[27]  G. Klir,et al.  ON THE COMPUTATION OF UNCERTAINTY MEASURE IN DEMPSTER-SHAFER THEORY , 1996 .

[28]  Isabelle Bloch,et al.  Some aspects of Dempster-Shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account , 1996, Pattern Recognit. Lett..

[29]  Khaled Mellouli,et al.  Classification with Belief Decision Trees , 2000, AIMSA.

[30]  Mei-Ling Shyu,et al.  Rule mining and classification in the presence of feature level and class label ambiguities , 2005, SPIE Defense + Commercial Sensing.

[31]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[32]  Patrick Vannoorenberghe,et al.  On aggregating belief decision trees , 2004, Inf. Fusion.

[33]  Eyke Hüllermeier,et al.  Learning Label Preferences: Ranking Error Versus Position Error , 2005, IDA.

[34]  Hong Xu,et al.  Reasoning in evidential networks with conditional belief functions , 1996, Int. J. Approx. Reason..

[35]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.