Boolean Factor Analysis for Data Preprocessing in Machine Learning

We present two input data preprocessing methods for machine learning (ML). The first one consists in extending the set of attributes describing objects in input data table by new attributes and the second one consists in replacing the attributes by new attributes. The methods utilize formal concept analysis (FCA) and boolean factor analysis, recently described by FCA, in that the new attributes are defined by so-called factor concepts computed from input data table. The methods are demonstrated on decision tree induction. The experimental evaluation and comparison of performance of decision trees induced from original and preprocessed input data is performed with standard decision tree induction algorithms ID3 and C4.5 on several benchmark datasets.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Claudio Carpineto,et al.  Concept data analysis - theory and applications , 2004 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Ki Hang Kim Boolean matrix theory and applications , 1982 .

[7]  K MurthySreerama Automatic Construction of Decision Trees from Data , 1998 .

[8]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[10]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[13]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[14]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[17]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[18]  M. Pazzani,et al.  ID2-of-3: Constructive Induction of M-of-N Concepts for Discriminators in Decision Trees , 1991 .