The WEKA data mining software: an update

More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[3]  Yorick Wilks,et al.  GATE: an environment to support research and development in natural language engineering , 1996, Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence.

[4]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[5]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[6]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Naftali Tishby,et al.  Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  Jinyuan You,et al.  CLOPE: a fast and effective clustering algorithm for transactional data , 2002, KDD.

[11]  Xin Xu,et al.  Statistical Learning in Multiple Instance Problems , 2003 .

[12]  Kristin P. Bennett,et al.  An Optimization Perspective on Kernel Partial Least Squares Regression , 2003 .

[13]  Peter A. Flach,et al.  Comparative Evaluation of Approaches to Propositionalization , 2003, ILP.

[14]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[15]  Matthias W. Seeger,et al.  Gaussian Processes For Machine Learning , 2004, Int. J. Neural Syst..

[16]  Stefan Kramer,et al.  Ensembles of nested dichotomies for multi-class problems , 2004, ICML.

[17]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[18]  Domenico Talia,et al.  Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids , 2005, PKDD.

[19]  Stefan Kramer,et al.  Ensembles of Balanced Nested Dichotomies for Multi-class Problems , 2005, PKDD.

[20]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[22]  Liangxiao Jiang,et al.  Weightily averaged one-dependence estimators , 2006 .

[23]  Geoffrey I. Webb,et al.  Efficient lazy elimination for averaged one-dependence estimators , 2006, ICML.

[24]  Kay Nieselt,et al.  Mayday-a microarray data analysis workbench , 2006, Bioinform..

[25]  Haijia Shi Best-first Decision Tree Learning , 2007 .

[26]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[27]  Ralf Zimmer,et al.  BioWeka - extending the Weka framework for bioinformatics , 2007, Bioinform..

[28]  Fionn Murtagh,et al.  The Haar Wavelet Transform of a Dendrogram , 2006, J. Classif..

[29]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[30]  Eibe Frank,et al.  Combining Naive Bayes and Decision Tables , 2008, FLAIRS.

[31]  Stan Matwin,et al.  Discriminative parameter learning for Bayesian networks , 2008, ICML '08.

[32]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.