PreCLAS: An Evolutionary Tool for Unsupervised Feature Selection

Several research areas are being faced with data matrices that are not suitable to be managed with traditional clustering, regression, or classification strategies. For example, biological so-called omic problems present models with thousands or millions of rows and less than a hundred columns. This matrix structure hinders the successful progress of traditional data analysis methods and thus needs some means for reducing the number of rows. This article presents an unsupervised approach called PreCLAS for preprocessing matrices with dimension problems to obtain data that are apt for clustering and classification strategies. The PreCLAS was implemented as an unsupervised strategy that aims at finding a submatrix with a drastically reduced number of rows, preferring those rows that together present some group structure. Experimentation was carried out in two stages. First, to assess its functionality, a benchmark dataset was studied in a clustering context. Then, a microarray dataset with genomic information was analyzed, and the PreCLAS was used to select informative genes in the context of classification strategies. Experimentation showed that the new method performs successfully at drastically reducing the number of rows of a matrix, smartly performing unsupervised feature selection for both classification and clustering problems.

[1]  M. Saniee Abadeh,et al.  Efficient instance selection algorithm for classification based on fuzzy frequent patterns , 2016, 2016 IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI).

[2]  Marek Grochowski,et al.  Comparison of Instance Selection Algorithms II. Results and Comments , 2004, ICAISC.

[3]  Alexandre Termier,et al.  Selecting representative instances from datasets , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[4]  Michela Antonelli,et al.  Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach , 2012, IEEE Transactions on Fuzzy Systems.

[5]  William Eberle,et al.  Instance selection by genetic-based biological algorithm , 2015, Soft Comput..

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  Peter C. Jurs,et al.  New index for clustering tendency and its application to chemical problems , 1990, J. Chem. Inf. Comput. Sci..

[8]  Ellen Samuels,et al.  Fantasies of Identification: Disability, Gender, Race , 2014 .

[9]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[10]  Takanori Fujita,et al.  PRC2 overexpression and PRC2-target gene repression relating to poorer prognosis in small cell lung cancer , 2013, Scientific Reports.

[11]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[12]  Mona Jamjoom,et al.  Partial instance reduction for noise elimination , 2016, Pattern Recognit. Lett..

[13]  Brian Mac Namee,et al.  Profiling instances in noise reduction , 2012, Knowl. Based Syst..

[14]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[15]  Ginés Rubio,et al.  New method for instance or prototype selection using mutual information in time series prediction , 2010, Neurocomputing.

[16]  Ángel Fernando Kuri Morales,et al.  A Search Space Reduction Methodology for Large Databases: A Case Study , 2007, ICDM.

[17]  Francisco Herrera,et al.  A Survey on Evolutionary Instance Selection and Generation , 2010, Int. J. Appl. Metaheuristic Comput..

[18]  J. Bezdek,et al.  VAT: a tool for visual assessment of (cluster) tendency , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[19]  José Francisco Martínez Trinidad,et al.  Object Selection Based on Clustering and Border Objects , 2008, Computer Recognition Systems 2.

[20]  Hisao Ishibuchi,et al.  Learning of neural networks with GA-based instance selection , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[21]  Enrique H. Ruspini,et al.  Numerical methods for fuzzy clustering , 1970, Inf. Sci..

[22]  William Eberle,et al.  Genetic algorithms in feature and instance selection , 2013, Knowl. Based Syst..

[23]  Francisco Herrera,et al.  Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..