Surveying the Field: Current Data Mining Applications, Analytic Tools, and Practical Challenges

Rexer Analytics conducted a 27-item survey of data miners in order to assess the algorithms and tools currently being employed, the priorities considered in selecting these tools, the types of data analyzed, and the challenges facing the data mining community. The survey was conducted on-line in early 2007. It received 314 responses from individuals in 35 countries. Respondents employed by tool vendors (n=100) were removed prior to most analyses. The remaining 214 data miners reported that predictive modeling and segmentation/clustering are the most common types of analyses conducted. The most commonly used algorithms are regression, decision trees and cluster analysis. The top challenges facing data miners are dirty data, data access, and explaining data mining to others. SPSS, SPSS Clementine, and SAS are the three most frequently utilized analytic tools. Comparisons of reported 2006 use and planned 2007 use show that there is increasing interest in the Oracle Data Mining tool, and decreasing interest in C4.5/C5.0/See5. The primary factors data miners consider when selecting an analytic tool are: 1) the dependability and stability of software, 2) the ability to handle large data sets, and 3) data manipulation capabilities. These main findings vary somewhat by the domain in which the data miner works, the tools used, geography, and several other dimensions.

[1]  Le Gruenwald,et al.  A survey of data mining and knowledge discovery software tools , 1999, SKDD.

[2]  R. Atkinson,et al.  Accessing Hidden and Hard-to-Reach Populations: Snowball Research Strategies , 2001 .

[3]  Marguerite Summers,et al.  Evaluation of fourteen desktop data mining tools , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).