Unsupervised Personalized Feature Selection

Feature selection is effective in preparing high-dimensional data for a variety of learning tasks such as classification, clustering and anomaly detection. A vast majority of existing feature selection methods assume that all instances share some common patterns manifested in a subset of shared features. However, this assumption is not necessarily true in many domains where data instances could show high individuality. For example, in the medical domain, we need to capture the heterogeneous nature of patients for personalized predictive modeling, which could be characterized by a subset of instance-specific features. Motivated by this, we propose to study a novel problem of personalized feature selection. In particular, we investigate the problem in an unsupervised scenario as label information is usually hard to obtain in practice. To be specific, we present a novel unsupervised personalized feature selection framework UPFS to find some shared features by all instances and instance-specific features tailored to each instance. We formulate the problem into a principled optimization framework and provide an effective algorithm to solve it. Experimental results on real-world datasets verify the effectiveness of the proposed UPFS framework.

[1]  Mohamed S. Kamel,et al.  An Efficient Greedy Method for Unsupervised Feature Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[2]  Huan Liu,et al.  Toward Personalized Relational Learning , 2017, SDM.

[3]  John Shawe-Taylor,et al.  Localized Lasso for High-Dimensional Regression , 2016, AISTATS.

[4]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[7]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[8]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Bo Liu,et al.  Uncorrelated Group LASSO , 2016, AAAI.

[11]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[12]  Feiping Nie,et al.  Exclusive Feature Learning on Arbitrary Structures via \ell_{1, 2}-norm , 2014, NIPS.

[13]  Ying Cui,et al.  Convex Principal Feature Selection , 2010, SDM.

[14]  Liang Du,et al.  Unsupervised Feature Selection with Adaptive Structure Learning , 2015, KDD.

[15]  Huan Liu,et al.  Reconstruction-based Unsupervised Feature Selection: An Embedded Approach , 2017, IJCAI.

[16]  Hongning Wang,et al.  Modeling Social Norms Evolution for Personalized Sentiment Classification , 2016, ACL.

[17]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[18]  Mohamed S. Kamel,et al.  Efficient greedy feature selection for unsupervised learning , 2012, Knowledge and Information Systems.

[19]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[20]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[21]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[22]  ChengXiang Zhai,et al.  Robust Unsupervised Feature Selection , 2013, IJCAI.

[23]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[24]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[25]  Jiayu Zhou,et al.  FORMULA: FactORized MUlti-task LeArning for task discovery in personalized medical models , 2015, SDM.

[26]  S. Thomas McCormick,et al.  Integer Programming and Combinatorial Optimization , 1996, Lecture Notes in Computer Science.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  Fangzhao Wu,et al.  Personalized Microblog Sentiment Classification via Multi-Task Learning , 2016, AAAI.

[30]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[31]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[32]  Huan Liu,et al.  Challenges of Feature Selection for Big Data Analytics , 2016, IEEE Intelligent Systems.