Spatial distance join based feature selection

A Spatial Distance Join (SDJ) based feature selection method (SDJ-FS) is developed to extend the concept of Correlation Fractal Dimension (CFD) to handle both feature relevance and redundancy jointly for supervised feature selection problems. The Pair-count Exponents (PCEs) for the SDJ between different classes and that of the entire dataset (i.e., the CFD of the dataset) are proposed respectively as feature relevance and redundancy measures. For the SDJ-FS method, an efficient divide-count approach of backward elimination property is designed for the calculation of the SDJ based feature quality (relevance and redundancy) measures. The extensive evaluations on both synthetic and benchmark datasets demonstrate the capability of SDJ-FS in identification of feature subsets of high relevance and low redundancy, along with the favorable performance of SDJ-FS over other reference feature selection methods (including those based on CFD). The success of SDJ-FS shows that, SDJ provides a good framework for the extension of CFD to supervised feature selection problems and offers a new view point for feature selection researches.

[1]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[2]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[3]  Alessandro Rozza,et al.  Novel high intrinsic dimensionality estimators , 2012, Machine Learning.

[4]  Christos Faloutsos,et al.  Fast Feature Selection using Fractal Dimension - Ten Years Later , 2010, J. Inf. Data Manag..

[5]  Chris H. Q. Ding,et al.  Analysis of gene expression profiles: class discovery and leaf ordering , 2002, RECOMB '02.

[6]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Christos Faloutsos,et al.  Spatial join selectivity using power laws , 2000, SIGMOD 2000.

[8]  JIANPING LI,et al.  Feature Selection via Least Squares Support Feature Machine , 2007, Int. J. Inf. Technol. Decis. Mak..

[9]  Alan Dove,et al.  Screening for content—the evolution of high throughput , 2003, Nature Biotechnology.

[10]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[11]  Witold Pedrycz,et al.  Feature selection using structural similarity , 2012, Inf. Sci..

[12]  Zhanhuai Li,et al.  The Practical Method of Fractal Dimensionality Reduction Based on Z-Ordering Technique , 2006, ADMA.

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  Kimito Funatsu,et al.  The Recent Trend in QSAR Modeling - Variable Selection and 3D-QSAR Methods , 2007 .

[15]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[16]  Yun Li,et al.  Fuzzy feature selection based on min-max learning rule and extension matrix , 2008, Pattern Recognit..

[17]  Rong Liu,et al.  Nano-SAR development for bioactivity of nanoparticles with considerations of decision boundaries. , 2013, Small.

[18]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[19]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[20]  Yong Shi The Research Trend of Information Technology and Decision Making in 2009 , 2010, Int. J. Inf. Technol. Decis. Mak..

[21]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[22]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[23]  M. C. Monard,et al.  A Fractal Dimension Based Filter Algorithm to Select Features for Supervised Learning , 2006, IBERAMIA-SBIA.

[24]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[27]  Igor Kononenko,et al.  Non-Myopic Feature Quality Evaluation with (R)ReliefF , 2007 .

[28]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[29]  Christos Faloutsos,et al.  Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[30]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[31]  Tommi S. Jaakkola,et al.  Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[32]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[33]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[34]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[35]  Christos Faloutsos,et al.  Fast feature selection using fractal dimension , 2010, J. Inf. Data Manag..

[36]  Manfred Schroeder,et al.  Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise , 1992 .

[37]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[38]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[39]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .

[40]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[41]  Zhi-Wei Ni,et al.  Stock trend prediction based on fractal feature selection and support vector machine , 2011, Expert Syst. Appl..

[42]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[43]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[44]  Manoranjan Dash,et al.  Distance Based Feature Selection for Clustering Microarray Data , 2008, DASFAA.

[45]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[46]  Ian Witten,et al.  Data Mining , 2000 .

[47]  S. Durga Bhavani,et al.  Feature selection using correlation fractal dimension: Issues and applications in binary classification problems , 2008, Appl. Soft Comput..

[48]  Yong Shi,et al.  Multiple Criteria Mathematical Programming and Data Mining , 2008, ICCS.

[49]  Tommy W. S. Chow,et al.  Efficiently searching the important input variables using Bayesian discriminant , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[50]  Katharina Morik,et al.  Fast-Ensembles of Minimum Redundancy Feature Selection , 2010, LWA.

[51]  Padraig Cunningham,et al.  Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets , 2004, SGAI Conf..

[52]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[54]  Rong Liu,et al.  Unsupervised Feature Selection Using Incremental Least Squares , 2011, Int. J. Inf. Technol. Decis. Mak..

[55]  Zhengxin Chen,et al.  A Descriptive Framework for the Field of Data Mining and Knowledge Discovery , 2008, Int. J. Inf. Technol. Decis. Mak..

[56]  Nizar Bouguila,et al.  On online high-dimensional spherical data clustering and feature selection , 2013, Eng. Appl. Artif. Intell..

[57]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[58]  A. Nel,et al.  Classification NanoSAR development for cytotoxicity of metal oxide nanoparticles. , 2011, Small.

[59]  Pavel Pudil,et al.  Novel Methods for Subset Selection with Respect to Problem Knowledge , 1998, IEEE Intell. Syst..

[60]  Christos Faloutsos,et al.  Deflating the dimensionality curse using multiple fractal dimensions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[61]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[62]  M. Castellani,et al.  Novel feature selection method using mutual information and fractal dimension , 2009, 2009 35th Annual Conference of IEEE Industrial Electronics.

[63]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[64]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[65]  Dmitrij Frishman,et al.  Pitfalls of supervised feature selection , 2009, Bioinform..

[66]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[67]  Daling Wang,et al.  Performance Optimization of Fractal Dimension Based Feature Selection Algorithm , 2004, WAIM.

[68]  Yiu-ming Cheung,et al.  Local Kernel Regression Score for Selecting Features of High-Dimensional Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[69]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Christos Faloutsos,et al.  A fast and effective method to find correlations among attributes in databases , 2007, Data Mining and Knowledge Discovery.

[71]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.