论文信息 - Combinatorial feature selection problems

Combinatorial feature selection problems

Motivated by frequently recurring themes in information retrieval and related disciplines, we define a genre of problems called combinatorial feature selection problems. Given a set S of multidimensional objects, the goal is to select a subset K of relevant dimensions (or features) such that some desired property /spl Pi/ holds for the set S restricted to K. Depending on /spl Pi/, the goal could be to either maximize or minimize the size of the subset K. Several well-studied feature selection problems can be cast in this form. We study the problems in this class derived from several natural and interesting properties /spl Pi/, including variants of the classical p-center problem as well as problems akin to determining the VC-dimension of a set system. Our main contribution is a theoretical framework for studying combinatorial feature selection, providing (in most cases essentially tight) approximation algorithms and hardness results for several instances of these problems.

[1] U. Feige,et al. On the Densest K-subgraph Problem , 1997 .

[2] Mihalis Yannakakis,et al. On limited nondeterminism and the complexity of the V-C dimension , 1993, [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference.

[3] Philip S. Yu,et al. Fast algorithms for projected clustering , 1999, SIGMOD '99.

[4] Daphne Koller,et al. Toward Optimal Feature Selection , 1996, ICML.

[5] Rajeev Motwani,et al. Randomized algorithms , 1996, CSUR.

[6] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[7] Daphne Koller,et al. Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[8] Nathan Linial,et al. The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[9] Huan Liu,et al. Handling Large Unsupervised Data via Dimensionality Reduction , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[10] Santosh S. Vempala,et al. Latent Semantic Indexing , 2000, PODS 2000.

[11] Leonard J. Schulman,et al. Clustering for Edge-Cost Minimization , 1999, Electron. Colloquium Comput. Complex..

[12] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[13] Aravind Srinivasan,et al. Improved approximations of packing and covering problems , 1995, STOC '95.

[14] A. Frieze,et al. A simple heuristic for the p-centre problem , 1985 .

[15] Piotr Indyk,et al. Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[16] D. Hochbaum,et al. A best possible approximation algorithm for the k--center problem , 1985 .

[17] George Kingsley Zipf,et al. Human behavior and the principle of least effort , 1949 .

[18] Santosh S. Vempala,et al. Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[19] J. Bourgain. On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[20] Yuval Rabani,et al. An O(log k) Approximate Min-Cut Max-Flow Theorem and Approximation Algorithm , 1998, SIAM J. Comput..

[21] David B. Shmoys,et al. A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..