Towards An efficient unsupervised feature selection methods for high-dimensional data

With the proliferation of the data, the dimensions of data have increased significantly, producing what is known as high-dimensional data. This increase of data dimensions results in redundant and non-representative features, which pose challenges to exis

[1]  Kun Li,et al.  Detection of Local Outlier over Dynamic Data Streams Using Efficient Partitioning Method , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[2]  Yao Xiao,et al.  Subspace Detection on Concept Drifting Data Stream , 2015 .

[3]  Changsheng Zhang,et al.  An Incremental Feature Subset Selection Algorithm Based on Boolean Matrix in Decision System , 2011 .

[4]  Le Gruenwald,et al.  Research issues in outlier detection for data streams , 2014, SKDD.

[5]  Venkatesh Saligrama,et al.  Video anomaly detection based on local statistical aggregates , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Philip S. Yu,et al.  Online Unsupervised Multi-view Feature Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[7]  Feiping Nie,et al.  Multi-View Unsupervised Feature Selection with Adaptive Similarity and View Weight , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jiadong Ren,et al.  Efficient Outlier Detection Algorithm for Heterogeneous Data Streams , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[10]  Kun Li,et al.  Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[11]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[12]  Huan Liu,et al.  Unsupervised Streaming Feature Selection in Social Media , 2015, CIKM.

[13]  Zahir Tari,et al.  Toward an efficient and scalable feature selection approach for internet traffic classification , 2013, Comput. Networks.

[14]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[16]  Jiankun Hu,et al.  Scalable Hypergrid k-NN-Based Online Anomaly Detection in Wireless Sensor Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[17]  Taghi M. Khoshgoftaar,et al.  How the Choice of Wrapper Learner and Performance Metric Affects Subset Evaluation , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[18]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[19]  Qinghua Hu,et al.  Feature selection with test cost constraint , 2012, ArXiv.

[20]  Lei Wang,et al.  Efficient Spectral Feature Selection with Minimum Redundancy , 2010, AAAI.

[21]  Ludovico Boratto,et al.  Using Collaborative Filtering to Overcome the Curse of Dimensionality when Clustering Users in a Group Recommender System , 2014, ICEIS.

[22]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[23]  Singh Vijendra,et al.  Feature Selection Using Classifier in High Dimensional Data , 2014, ArXiv.

[24]  Chao-Ton Su,et al.  An Extended Chi2 Algorithm for Discretization of Real Value Attributes , 2005, IEEE Trans. Knowl. Data Eng..

[25]  Huan Liu,et al.  Toward Personalized Relational Learning , 2017, SDM.

[26]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[27]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[28]  Mao Tian,et al.  Wrapper approach for feature subset selection using GA , 2007, 2007 International Symposium on Intelligent Signal Processing and Communication Systems.

[29]  Jingcheng Wang,et al.  Neighborhood effective information ratio for hybrid feature subset evaluation and selection , 2013, Neurocomputing.

[30]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[32]  Shi-Jinn Horng,et al.  A novel intrusion detection system based on hierarchical clustering and support vector machines , 2011, Expert Syst. Appl..

[33]  Xueyi Wang,et al.  A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality , 2011, The 2011 International Joint Conference on Neural Networks.

[34]  Leandro Nunes de Castro,et al.  A Cluster-Based Feature Selection Approach , 2009, HAIS.

[35]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[36]  Shiliang Sun,et al.  Multi-view clustering ensembles , 2013, 2013 International Conference on Machine Learning and Cybernetics.

[37]  Jianhua Chen,et al.  An ID-based proxy signature schemes without bilinear pairings , 2011, Ann. des Télécommunications.

[38]  Jing-Yu Yang,et al.  An efficient kernel-based nonlinear regression method for two-class classification , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[39]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[40]  Durga Toshniwal,et al.  A Framework for Outlier Detection in Evolving Data Streams by Weighting Attributes in Clustering , 2012 .

[41]  Antonio Martínez-Álvarez,et al.  Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps , 2014, Knowl. Based Syst..

[42]  Abdulmohsen Almalawi,et al.  Designing unsupervised intrusion detection for SCADA systems , 2014 .

[43]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[44]  M. Parimala,et al.  A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases , 2011 .

[45]  Mengjie Zhang,et al.  A multi-objective particle swarm optimisation for filter-based feature selection in classification problems , 2012, Connect. Sci..

[46]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[47]  Wesam M. Ashour,et al.  DIMK-means “Distance-based Initialization Method for K-means Clustering Algorithm” , 2013 .

[48]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[49]  Rayner Alfred,et al.  A genetic based wrapper feature selection approach using Nearest Neighbour Distance Matrix , 2011, 2011 3rd Conference on Data Mining and Optimization (DMO).

[50]  Yves Lechevallier,et al.  Multi-view hard c-means with automated weighting of views and variables , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[51]  Calyampudi R. Rao,et al.  Linear statistical inference and its applications , 1965 .

[52]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[53]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .

[54]  Liang Liu,et al.  Attribute selection based on a new conditional entropy for incomplete decision systems , 2013, Knowl. Based Syst..

[55]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[56]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[57]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[58]  Manoranjan Dash,et al.  Feature Selection for Clustering , 2009, Encyclopedia of Database Systems.

[59]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[60]  Jie Yu A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes , 2012 .

[61]  Aristidis Likas,et al.  Convex Mixture Models for Multi-view Clustering , 2009, ICANN.

[62]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Yanbin Liu,et al.  Discriminative multi-view feature selection and fusion , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[64]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[65]  Georgios Kambourakis,et al.  The best of both worlds: a framework for the synergistic operation of host and cloud anomaly-based IDS for smartphones , 2014, EuroSec '14.

[66]  Piyush Rai,et al.  Multiview Clustering with Incomplete Views , 2010 .

[67]  Shengyi Jiang,et al.  Unsupervised feature selection based on clustering , 2010, 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA).

[68]  Michel Verleysen,et al.  Feature selection with missing data using mutual information estimators , 2012, Neurocomputing.

[69]  Bo Ma,et al.  Applying Improved Clustering Algorithm into EC Environment Data Mining , 2014 .

[70]  Lam For Kwok,et al.  IDS False Alarm Filtering Using KNN Classifier , 2004, WISA.

[71]  Leena Choi,et al.  Validation of accelerometer wear and nonwear time classification algorithm. , 2011, Medicine and science in sports and exercise.

[72]  Yuan Zhang,et al.  A Density-Grid Based Clustering Algorithm on Data Stream Using Resilient Distributed Datasets , 2016, Canadian Conference on AI.

[73]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[74]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  Zhuo Liu,et al.  A Review of Uncertain Data Stream Clustering Algorithms , 2015, 2015 Eighth International Conference on Internet Computing for Science and Engineering (ICICSE).

[76]  Yuchou Chang,et al.  Consensus unsupervised feature ranking from multiple views , 2008, Pattern Recognit. Lett..

[77]  Suhaimi Ibrahim,et al.  Outlier Detection in Stream Data by Clustering Method , 2014 .

[78]  Feiping Nie,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Feature Selection via Joint Embedding Learning and Sparse Regression , 2022 .

[79]  Thierry Denoeux,et al.  An evidential classifier based on feature selection and two-step classification strategy , 2015, Pattern Recognit..

[80]  Philip S. Yu,et al.  Tensor-Based Multi-view Feature Selection with Applications to Brain Diseases , 2014, 2014 IEEE International Conference on Data Mining.

[81]  Guang Gong,et al.  Accelerating signature-based broadcast authentication for wireless sensor networks , 2012, Ad Hoc Networks.

[82]  Satinder Singh,et al.  Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters , 2005, ACSC.

[83]  Ming-Syan Chen,et al.  Clustering over Multiple Evolving Streams by Events and Correlations , 2007, IEEE Transactions on Knowledge and Data Engineering.

[84]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[85]  Chun-Nan Hsu,et al.  The ANNIGMA-wrapper approach to fast feature selection for neural nets , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[86]  Gianmarco De Francisci Morales,et al.  Big Data Stream Learning with SAMOA , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[87]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[88]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[89]  Li Guo,et al.  Comparative study between incremental and ensemble learning on data streams: Case study , 2014, Journal Of Big Data.

[90]  Alexander Mendiburu,et al.  A Distance-Based Ranking Model Estimation of Distribution Algorithm for the Flowshop Scheduling Problem , 2014, IEEE Transactions on Evolutionary Computation.

[91]  Claudio Sartori,et al.  Distributed Strategies for Mining Outliers in Large Data Sets , 2013, IEEE Transactions on Knowledge and Data Engineering.

[92]  Sumeet Dua,et al.  Data Mining and Machine Learning in Cybersecurity , 2011 .

[93]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[94]  Qinghua Hu,et al.  Mixed sparsity regularized multi-view unsupervised feature selection , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[95]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[96]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[97]  Jing Zhou,et al.  Streaming feature selection using alpha-investing , 2005, KDD '05.

[98]  Mohan S. Kankanhalli,et al.  Experiential Sampling in Multimedia Systems , 2006, IEEE Transactions on Multimedia.

[99]  M. Narasimha Murty,et al.  Unsupervised feature selection for outlier detection in categorical data using mutual information , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[100]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Haytham Elghazel,et al.  Feature Selection for Unsupervised Learning Using Random Cluster Ensembles , 2010, 2010 IEEE International Conference on Data Mining.

[102]  Yannis A. Dimitriadis,et al.  Anomaly Detection in Network Traffic Based on Statistical Inference and \alpha-Stable Modeling , 2011, IEEE Transactions on Dependable and Secure Computing.

[103]  Gui Yun Tian,et al.  A FEATURE EXTRACTION TECHNIQUE BASED ON PRINCIPAL COMPONENT ANALYSIS FOR PULSED EDDY CURRENT NDT , 2003 .

[104]  Yuchou Chang,et al.  Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..

[105]  Huan Liu,et al.  An Unsupervised Feature Selection Framework for Social Media Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[106]  Zahir Tari,et al.  UFSSF - An Efficient Unsupervised Feature Selection for Streaming Features , 2018, PAKDD.

[107]  Hirosuke Yamamoto,et al.  Unsupervised anomaly detection within non-numerical sequence data by average index difference, with application to masquerade detection , 2011 .

[108]  Huan Liu,et al.  Unsupervised Feature Selection for Multi-View Data in Social Media , 2013, SDM.

[109]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[110]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[111]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[112]  Zahir Tari,et al.  Dimensionality Reduction for Intrusion Detection Systems in Multi-data Streams—A Review and Proposal of Unsupervised Feature Selection Scheme , 2017 .

[113]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[114]  Jesús S. Aguilar-Ruiz,et al.  Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches , 2012, Expert Syst. Appl..

[115]  Dayou Liu,et al.  A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis , 2011, Expert Syst. Appl..

[116]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[117]  Gypsy Nandi,et al.  An enhanced approach to Las Vegas Filter (LVF) feature selection algorithm , 2011, 2011 2nd National Conference on Emerging Trends and Applications in Computer Science.

[118]  Hema Banati,et al.  Fire Fly Based Feature Selection Approach , 2011 .

[119]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[120]  Sanjiv Kumar,et al.  On the Difficulty of Nearest Neighbor Search , 2012, ICML.

[121]  Huan Liu,et al.  Multi-Source Feature Selection via Geometry-Dependent Covariance Analysis , 2008, FSDM.

[122]  Tapani Raiko,et al.  Semi-supervised anomaly detection – towards model-independent searches of new physics , 2011, 1112.3329.

[123]  Yueting Zhuang,et al.  Adaptive Unsupervised Multi-view Feature Selection for Visual Concept Recognition , 2012, ACCV.

[124]  Nikita Joshi,et al.  A Review Paper on Feature Selection Methodologies and Their Applications , 2014 .

[125]  Jennifer G. Dy,et al.  Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection , 2014, Journal of Computer Science and Technology.

[126]  Hao Wu,et al.  An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine , 2011, Knowl. Based Syst..

[127]  Justin A. Blanco,et al.  Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement , 2011, Journal of neural engineering.

[128]  Feiping Nie,et al.  Multi-View Clustering and Feature Learning via Structured Sparsity , 2013, ICML.

[129]  Dana Kulic,et al.  An evaluation of classifier-specific filter measure performance for feature selection , 2015, Pattern Recognit..

[130]  Yinghuan Shi,et al.  Incomplete-Data Oriented Multiview Dimension Reduction via Sparse Low-Rank Representation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[131]  Salvatore J. Stolfo,et al.  Unsupervised Anomaly-Based Malware Detection Using Hardware Features , 2014, RAID.

[132]  Verónica Bolón-Canedo,et al.  Feature Selection for High-Dimensional Data , 2015, Artificial Intelligence: Foundations, Theory, and Algorithms.