Heat Map Based Feature Selection: A Case Study for Ovarian Cancer

Public health is a critical issue, therefore we can find a great research interest to find faster and more accurate methods to detect diseases. In the particular case of cancer, the use of mass spectrometry data has become very popular but some problems arise due to that the number of mass-to-charge ratios exceed by a huge margin the number of patients in the samples. In order to deal with the high dimensionality of the data, most works agree with the necessity to use pre-processing. In this work we propose an algorithm called Heat Map Based Feature Selection (HmbFS) that can work with huge data without the need of pre-processing, thanks to a built-in compression mechanism based on color quantization. Results shows that our proposal is very competitive against some of the most popular algorithms and succeeds where other methodologies may fail due to the high dimensionality of the data.

[1]  C. Bonferroni Il calcolo delle assicurazioni su gruppi di teste , 1935 .

[2]  Mukesh Verma,et al.  Proteomics for Cancer Biomarker Discovery , 2002 .

[3]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[4]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[5]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[6]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[7]  U Depczynski,et al.  Genetic algorithms applied to the selection of factors in principal component regression , 2000 .

[8]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[9]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[10]  G. Wright,et al.  Development of a novel proteomic approach for the detection of transitional cell carcinoma of the bladder in urine. , 2001, The American journal of pathology.

[11]  Bin Han,et al.  Sparse representation based feature selection for mass spectrometry data , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[12]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[13]  JOHN HAMPDEN The Eclipse Expedition , 1871, Nature.

[14]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[15]  M. Ferrari,et al.  Clinical proteomics: Written in blood , 2003, Nature.

[16]  Lance A. Liotta,et al.  Cancer Proteomics: The State of the Art , 2002, Disease markers.

[17]  S. Weinberger,et al.  Current developments in SELDI affinity technology. , 2004, Mass spectrometry reviews.

[18]  P. A. Ramamoorthy,et al.  Principal Component Analysis Based Feature Extraction, Morphological Edge Detection and Localization for Fast Iris Recognition , 2012 .

[19]  Mengjie Zhang,et al.  Feature Selection and Classification of High Dimensional Mass Spectrometry Data: A Genetic Programming Approach , 2013, EvoBIO.

[20]  Reyes Juárez-Ramírez,et al.  Filter feature selection performance comparison in high-dimensional data: A theoretical and empirical analysis of most popular algorithms , 2014, 17th International Conference on Information Fusion (FUSION).

[21]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[22]  Jiangsheng Yu,et al.  Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data , 2005, ISMB.

[23]  M. Trosset,et al.  Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques. , 2005, Clinical chemistry.

[24]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[25]  B Dittmann,et al.  Strategies for the development of reliable QA/QC methods when working with mass spectrometry-based chemosensory systems , 2000 .

[26]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[27]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[28]  Yihui Liu,et al.  Feature extraction and dimensionality reduction for mass spectrometry data , 2009, Comput. Biol. Medicine.

[29]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[30]  S. Datta,et al.  Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples , 2006 .

[31]  Eugene R. Tracy,et al.  A Bayesian network approach to feature selection in mass spectrometry data , 2010, BMC Bioinformatics.

[32]  Dapeng Wu,et al.  A RELIEF Based Feature Extraction Algorithm , 2008, SDM.