Semi-supervised Learning for Mixed-Type Data via Formal Concept Analysis

Only few machine learning methods; e.g., the decision tree-based classification method, can handle mixed-type data sets containing both of discrete (binary and nominal) and continuous (real-valued) variables and, moreover, no semi-supervised learning method can treat such data sets directly. Here we propose a novel semi-supervised learning method, called SELF (SEmi-supervised Learning via FCA), for mixed-type data sets using Formal Concept Analysis (FCA). SELF extracts a lattice structure via FCA together with discretizing continuous variables and learns classification rules using the structure effectively. Incomplete data sets including missing values can be handled directly in our method. We experimentally demonstrate competitive performance of SELF compared to other supervised and semi-supervised learning methods. Our contribution is not only giving a novel semi-supervised learning method, but also bridging two fields of conceptual analysis and knowledge discovery.

[1]  Amedeo Napoli,et al.  Mining gene expression data with pattern structures in formal concept analysis , 2011, Inf. Sci..

[2]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Bernhard Ganter,et al.  Conceptual Structures for Knowledge Creation and Communication , 2003, Lecture Notes in Computer Science.

[5]  Shigeo Abe Analysis of Multiclass Support Vector Machines , 2002 .

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Sergei O. Kuznetsov,et al.  Learning Closed Sets of Labeled Graphs for Chemical Applications , 2005, ILP.

[8]  Akihiro Yamamoto,et al.  Topological properties of concept spaces (full version) , 2010, Inf. Comput..

[9]  Graham J. Williams,et al.  Data Mining , 2000, Communications in Computer and Information Science.

[10]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[11]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[12]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[13]  Yamamoto Akihiro,et al.  The Coding Divergence for Measuring the Complexity of Separating Two Sets , 2010 .

[14]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[15]  Torben Hagerup,et al.  Algorithm Theory - SWAT 2004 , 2004, Lecture Notes in Computer Science.

[16]  Bart Goethals,et al.  Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations , 2005, KDD 2005.

[17]  Pedro M. Domingos,et al.  Learning Markov logic network structure via hypergraph lifting , 2009, ICML '09.

[18]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[19]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[20]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[21]  Emilio Corchado,et al.  Intelligent Data Engineering and Automated Learning - IDEAL 2006, 7th International Conference, Burgos, Spain, September 20-23, 2006, Proceedings , 2006, IDEAL.

[22]  Bernhard Ganter,et al.  Hypotheses and Version Spaces , 2003, ICCS.

[23]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[24]  Andreas Hotho,et al.  TRIAS--An Algorithm for Mining Iceberg Tri-Lattices , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[26]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[27]  Olivier Chapelle,et al.  A taxonomy of semi-supervised learning algorithms , 2005 .

[28]  Bernhard Ganter,et al.  Formalizing Hypotheses with Concepts , 2000, ICCS.

[29]  Sergei O. Kuznetsov,et al.  Toxicology Analysis by Means of the JSM-method , 2003, Bioinform..

[30]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[31]  Jaakko Hollmén,et al.  Quantization of Continuous Input Variables for Binary Classification , 2000, IDEAL.

[32]  Bernhard Ganter,et al.  Conceptual Structures: Logical, Linguistic, and Computational Issues , 2000, Lecture Notes in Computer Science.

[33]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[34]  Sergei O. Kuznetsov,et al.  Machine Learning and Formal Concept Analysis , 2004, ICFCA.

[35]  Rokia Missaoui,et al.  Formal Concept Analysis for Knowledge Discovery and Data Mining: The New Challenges , 2004, ICFCA.

[36]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[37]  Bernhard Ganter,et al.  Formal Concept Analysis , 2013 .

[38]  Yun Zhang,et al.  A New Search Results Clustering Algorithm Based on Formal Concept Analysis , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.