Mining logic models in the presence of noisy data

In this work we consider a method for the extraction of knowledge expressed in Disjunctive Normal Form (DNF) from data. The method is mainly designed for classification purposes, and is based on three main steps: Discretization, Feature Selection, and Formula Extraction. The three steps are formulated as optimization problems and solved with ad hoc algorithmic strategies. When used for classification purposes, the proposed approach is designed to perform exact separation of training data and can thus be exposed to overfitting when a significant amount of noise is present. We analyze the main problems that may arise when this method deals with noisy data and propose extensions for the three steps of the method.

[1]  B. J. Lageweg,et al.  Branch-and-Bound Algorithms for the Test Cover Problem , 2002, ESA.

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  Klaus Truemper,et al.  Learning Logic Formulas and Related Error Distributions , 2006 .

[4]  Jake Y. Chen,et al.  Biological Data Mining , 2009 .

[5]  Toshihide Ibaraki,et al.  Logical Analysis of Binary Data with Missing Bits , 1999, Artif. Intell..

[6]  Giovanni Felici,et al.  Learning to classify species with barcodes , 2009, BMC Bioinformatics.

[7]  Klaus Truemper,et al.  Lsquare System for Mining Logic Data , 2005 .

[8]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  Jing Yuan,et al.  Rule based classifier for the analysis of gene-gene and gene-environment interactions in genetic association studies , 2009, BioData Mining.

[11]  Celso C. Ribeiro,et al.  Greedy Randomized Adaptive Search Procedures , 2003, Handbook of Metaheuristics.

[12]  M. Resende,et al.  A probabilistic heuristic for a computationally difficult set covering problem , 1989 .

[13]  Mauricio G. C. Resende,et al.  An Annotated Bibliography of Grasp Part I: Algorithms , 2022 .

[14]  Evangelos Triantaphyllou,et al.  On the minimum number of logical clauses inferred from examples , 1996, Comput. Oper. Res..

[15]  Giovanni Felici,et al.  Logic classification and feature selection for biomedical data , 2008, Comput. Math. Appl..

[16]  Toshihide Ibaraki,et al.  Logical analysis of numerical data , 1997, Math. Program..

[17]  Giovanni Felici,et al.  Application of feature selection and classification to computational molecular biology , 2008 .

[18]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[19]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[20]  Mauricio G. C. Resende,et al.  An Annotated Bibliography of Grasp Part Ii: Applications , 2022 .

[21]  P. Bertolazzi,et al.  Gene expression biomarkers in the brain of a mouse model for Alzheimer's disease: mining of microarray data by logic classification and feature selection. , 2011, Journal of Alzheimer's disease : JAD.