Introduction to High-Dimensionality

Abstract In an era in which the complexity and volume of data available in the field of machine learning is growing daily, feature selection plays an important role, helping to reduce the “high-dimensionality” of some problems. In this chapter, the problematics and characteristics of these “high-dimensional” datasets will be presented. Section 1.1 introduces the need for feature selection from the advent of Big Data. In Section 1.2, we outline the main applications that are promoting feature selection. Then, in Section 1.3, the inherent characteristics of some problems that may hinder the feature selection process are also discussed. Finally, we give an overview of the different chapters of this book in Section 1.4.

[1]  Hui Xiong,et al.  Enhancing data analysis with noise removal , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  R. Tibshirani,et al.  "Preconditioning" for feature selection and regression in high-dimensional problems , 2007, math/0703858.

[4]  Christos Boutsidis,et al.  Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[5]  Kay Chen Tan,et al.  A hybrid evolutionary algorithm for attribute selection in data mining , 2009, Expert Syst. Appl..

[6]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[7]  Choh-Man Teng,et al.  Combining Noise Correction with Feature Selection , 2003, DaWaK.

[8]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[9]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[10]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[11]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[12]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[13]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[14]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[15]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[16]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[17]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[18]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[19]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[20]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Verónica Bolón-Canedo,et al.  A framework for cost-based feature selection , 2014, Pattern Recognit..

[24]  Hugo Jair Escalante,et al.  A Comparison of Outlier Detection Algorithms for Machine Learning , 2005 .

[25]  B. Bonev Feature Selection based on Information Theory , 2010 .

[26]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[27]  Hong Zhao,et al.  Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors , 2013, J. Appl. Math..

[28]  Victor S. Sheng,et al.  Class Imbalance Problem , 2010, Encyclopedia of Machine Learning.