KODAMA: an R package for knowledge discovery and data mining

Summary: KODAMA, a novel learning algorithm for unsupervised feature extraction, is specifically designed for analysing noisy and high‐dimensional datasets. Here we present an R package of the algorithm with additional functions that allow improved interpretation of high‐dimensional data. The package requires no additional software and runs on all major platforms. Availability and Implementation: KODAMA is freely available from the R archive CRAN (http://cran.r‐project.org). The software is distributed under the GNU General Public License (version 3 or later). Contact: s.cacciatore@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Ivano Bertini,et al.  Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. , 2012, Cancer research.

[2]  Licheng Jiao,et al.  Automatic Band Selection Using Spatial-Structure Information and Classifier-Based Clustering , 2016, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[3]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[4]  F. Windmeijer,et al.  An R-squared measure of goodness of fit for some common nonlinear regression models , 1997 .

[5]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[6]  J. Garcia-conde,et al.  Serum metabolome analysis by 1H-NMR reveals differences between chronic lymphocytic leukaemia molecular subgroups , 2010, Leukemia.

[7]  Saumyadipta Pyne,et al.  AKT1 and MYC induce distinctive metabolic fingerprints in human prostate cancer. , 2014, Cancer research.

[8]  Claudio Luchinat,et al.  Knowledge discovery by accuracy maximization , 2014, Proceedings of the National Academy of Sciences.

[9]  L. Tenori,et al.  Metabonomic analysis of saliva reveals generalized chronic periodontitis signature , 2011, Metabolomics.

[10]  David Ardia,et al.  DEoptim: An R Package for Global Optimization by Differential Evolution , 2009 .

[11]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[12]  Conrad Sanderson,et al.  RcppArmadillo: Accelerating R with high-performance C++ linear algebra , 2014, Comput. Stat. Data Anal..

[13]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .