Unsupervised feature construction for improving data representation and semantics

Attribute-based format is the main data representation format used by machine learning algorithms. When the attributes do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly constructed features rarely have any meaning. We seek to construct, in an unsupervised way, new attributes that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new attributes as conjunctions of the initial primitive attributes or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like $sky \wedge \neg building \wedge panorama$ would be true for non-urban images and is more informative than simple features expressing the presence or the absence of an object. The notion of Pareto optimality is used to evaluate feature sets and to obtain a balance between total correlation and the complexity of the resulted feature set. Statistical hypothesis testing is employed in order to automatically determine the values of the parameters used for constructing a data-dependent feature set. We experimentally show that our approaches achieve the construction of informative feature sets for multiple datasets.

[1]  Hiroshi Motoda,et al.  Feature Selection Extraction and Construction , 2002 .

[2]  Christopher J. Matheus,et al.  Adding Domain Knowledge to SBL Through Feature Construction , 1990, AAAI.

[3]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[4]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[5]  Hirotaka Nakayama,et al.  Theory of Multiobjective Optimization , 1985 .

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[8]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[9]  Y. Benjamini,et al.  A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence , 1999 .

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Zijian Zheng,et al.  A Comparison of Constructive Induction with Diierent Types of New Attribute , 1996 .

[12]  Olivier Teytaud,et al.  Statistical inference and data mining: false discoveries control , 2006 .

[13]  Stéphane Lallich,et al.  Fast Feature Selection Using Partial Correlation for Multi-vaslued Attributes , 2000, PKDD.

[14]  X. Huo,et al.  A Survey of Manifold-Based Learning Methods , 2007 .

[15]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[16]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[17]  E. Morales,et al.  Automatic Feature Construction and a Simple Rule Induction Algorithm for Skin Detection , 2002 .

[18]  John D. Storey A direct approach to false discovery rates , 2002 .

[19]  Samuel H. Huang,et al.  Feature selection based on inference correlation , 2011, Intell. Data Anal..

[20]  J. Wolfowitz Review: William Feller, An introduction to probability theory and its applications. Vol. I , 1951 .

[21]  Zijian Zheng,et al.  Constructing Nominal X-of-N Attributes , 1995, IJCAI.

[22]  Larry A. Rendell,et al.  A Scheme for Feature Construction and a Comparison of Empirical Methods , 1991, IJCAI.

[23]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[24]  G. Dunteman Principal Components Analysis , 1989 .

[25]  Zijian Zheng,et al.  Constructing conjunctions using systematic search on decision trees , 1998, Knowl. Based Syst..

[26]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[27]  Ryszard S. Michalski,et al.  Data-Driven Constructive Induction , 1998, IEEE Intell. Syst..

[28]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[29]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[30]  Bo Li,et al.  Analysis of a hybrid cutoff priority scheme for multiple classes of traffic in multimedia wireless networks , 1998, Wirel. Networks.

[31]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[32]  M. Pazzani,et al.  ID2-of-3: Constructive Induction of M-of-N Concepts for Discriminators in Decision Trees , 1991 .