Constraint-Based Mining of Fault-Tolerant Patterns from Boolean Data

Thanks to an important research effort during the last few years, inductive queries on local patterns (e.g., set patterns) and their associated complete solvers have been proved extremely useful to support knowledge discovery. The more we use such queries on real-life data, e.g., biological data, the more we are convinced that inductive queries should return fault-tolerant patterns. This is obviously the case when considering formal concept discovery from noisy datasets. Therefore, we study various extensions of this kind of bi-set towards fault-tolerance. We compare three declarative specifications of fault-tolerant bi-sets by means of a constraint-based mining approach. Our framework enables a better understanding of the needed trade-off between extraction feasibility, completeness, relevance, and ease of interpretation of these fault-tolerant patterns. An original empirical evaluation on both synthetic and real-life medical data is given. It enables a comparison of the various proposals and it motivates further directions of research.

[1]  Jean-François Boulicaut,et al.  Approximation de collections de concepts formels par des bi-ensembles denses et pertinents , 2005, CAP.

[2]  Ruggero G. Pensa,et al.  A Bi-clustering Framework for Categorical Data , 2005, PKDD.

[3]  Liris Cnrs,et al.  Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach , 2004 .

[4]  Bart Goethals,et al.  FIMI '03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, 19 December 2003, Melbourne, Florida, USA , 2003, FIMI.

[5]  Gerd Stumme,et al.  Computing iceberg concept lattices with T , 2002, Data Knowl. Eng..

[6]  Aristides Gionis,et al.  Geometric and Combinatorial Tiles in 0-1 Data , 2004, PKDD.

[7]  Cheng Yang,et al.  Efficient discovery of error-tolerant frequent itemsets in high dimensions , 2001, KDD '01.

[8]  Anthony K. H. Tung,et al.  Fault-Tolerant Frequent Pattern Mining: Problems and Challenges , 2001, DMKD.

[9]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[10]  J Demongeot,et al.  Variables processing in expert system building: application to the aetiological diagnosis of infantile meningitis. , 1990, Medical informatics = Medecine et informatique.

[11]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[12]  Jean-François Boulicaut,et al.  Characterization of unsupervised clusters with the simplest association rules: application for child's meningitis , 2002 .

[13]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[14]  Cláudia Antunes,et al.  Constraint Relaxations for Discovering Unknown Sequential Patterns , 2004, KDID.

[15]  Jean-François Boulicaut,et al.  Approximation of Frequency Queris by Means of Free-Sets , 2000, PKDD.

[16]  Stefano Bistarelli,et al.  Interestingness is Not a Dichotomy: Introducing Softness in Constrained Pattern Mining , 2005, PKDD.

[17]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[18]  Jean-François Boulicaut,et al.  Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..

[19]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[20]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[21]  Jean-François Boulicaut,et al.  Mining Formal Concepts with a Bounded Number of Exceptions from Transactional Data , 2004, KDID.

[22]  Heikki Mannila,et al.  Dense itemsets , 2004, KDD.

[23]  Ruggero G. Pensa,et al.  From Local Pattern Mining to Relevant Bi-cluster Characterization , 2005, IDA.

[24]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.