Feature selection environment for genomic applications

BackgroundFeature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application.ResultsThe intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors) is also implemented in the system.ConclusionThe proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Roberto Marcondes Cesar Junior,et al.  AGN Simulation and Validation Model , 2008, BSB.

[3]  Edward R. Dougherty,et al.  The coefficient of intrinsic dependence (feature selection using el CID) , 2005, Pattern Recognit..

[4]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[5]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[6]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[7]  I. Jolliffe Principal Component Analysis , 2002 .

[8]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[9]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[10]  Edward R. Dougherty,et al.  Coefficient of determination in nonlinear signal processing , 2000, Signal Process..

[11]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[12]  David Correa Martins,et al.  W-operator window design by minimization of mean conditional entropy , 2006, Pattern Analysis and Applications.

[13]  BMC Bioinformatics , 2005 .

[14]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[15]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[16]  David Correa Martins,et al.  Constructing Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals of the Intraerythrocytic Development Cycle , 2007 .

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  T. D. Campos Técnicas de seleção de características com aplicações em reconhecimento de faces. , 2001 .

[19]  阿部 純義,et al.  Nonextensive statistical mechanics and its applications , 2001 .

[20]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[21]  S. Abe,et al.  Nonextensive Statistical Mechanics and Its Applications , 2010 .

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[24]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[25]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Raya Khanin,et al.  Methods of Microarray Data Analysis V , 2007 .

[27]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[28]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.