Finding Essential Attributes from Binary Data

We consider data sets that consist of n-dimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NP-hard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets.

[1]  Martin Shubik,et al.  A Method for Evaluating the Distribution of Power in a Committee System , 1954, American Political Science Review.

[2]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[3]  Robert O. Winder,et al.  Chow Parameters in Threshold Logic , 1971, JACM.

[4]  Robert O. Winder,et al.  Threshold logic , 1971, IEEE Spectrum.

[5]  László Lovász,et al.  On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[6]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[9]  G. Nemhauser,et al.  Maximizing Submodular Set Functions: Formulations and Analysis of Algorithms* , 1981 .

[10]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[13]  Gérard Cornuéjols,et al.  Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem , 1984, Discret. Appl. Math..

[14]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[15]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[16]  Y. Crama,et al.  Cause-effect relationships and partially defined Boolean functions , 1988 .

[17]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[18]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[19]  N. Littlestone,et al.  Learning in the presence of finitely or infinitely many irrelevant attributes , 1991, COLT '91.

[20]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[21]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[22]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[23]  Thomas G. Dietterich,et al.  Efficient Algorithms for Identifying Relevant Features , 1992 .

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  Brian W. Kernighan,et al.  AMPL: A Modeling Language for Mathematical Programming , 1993 .

[26]  Mark S. Boddy,et al.  Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[27]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[28]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[29]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[30]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[31]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[32]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[33]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[34]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[35]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[36]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[37]  Uriel Feige A threshold of ln n for approximating set cover (preliminary version) , 1996, STOC '96.

[38]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[39]  M. Karpinski,et al.  Approximating dense cases of covering problems , 1996, Network Design: Connectivity and Facilities Location.

[40]  Toshihide Ibaraki,et al.  Logical analysis of numerical data , 1997, Math. Program..

[41]  Wilfried Brauer,et al.  Feature Selection by Means of a Feature Weighting Approach , 1997 .

[42]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[43]  U. Feige A threshold of ln n for approximating set cover , 1998, JACM.

[44]  Lisa Hellerstein,et al.  Attribute-Efficient Learning in Query and Mistake-Bound Models , 1998, J. Comput. Syst. Sci..

[45]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[46]  D. Hochbaum,et al.  Analysis of the greedy approach in problems of maximum k‐coverage , 1998 .

[47]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[48]  Toshihide Ibaraki,et al.  Error-Free and Best-Fit Extensions of Partially Defined Boolean Functions , 1998, Inf. Comput..

[49]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[50]  Huan Liu,et al.  A Monotonic Measure for Optimal Feature Selection , 1998, ECML.

[51]  Toshihide Ibaraki,et al.  Logical Analysis of Binary Data with Missing Bits , 1999, Artif. Intell..

[52]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[53]  B. Endre,et al.  Finding Small Sets of Essential Attributes in Binary Data , 2000 .

[54]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[55]  Toshihide Ibaraki,et al.  Finding Essential Attributes in Binary Data , 2000, IDEAL.

[56]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[57]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[58]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.