Design of Heuristic Decision Tree (HDT) Using Human Knowledge

Abstract Data mining is the process of extracting hidden patterns from collected data. At this time, for collected data which take important role as the basic information for prediction and recommendation, the process to discriminate incorrect data in order to enhance the performance of analysis result, is needed. The existing methods to discriminate unexpected data from collected data, mainly relies on methods w hich are based on statistics or simple distance between data. However, for these methods, the problematic point that even meaningful data could be excluded from analysis due that the environment and characteristic of the rel evant data are not considered, exists. This study proposes a method to endow human heuristic knowledge with weight value through the comparison between collected data and human heuristic knowledge, and to use the value for creating a decision tree. The data discrimination by the method proposed is more credible as human knowledge is reflecte d in the created tree. The validity of the proposed method is verified through an experiment. Key Words : Heuristic Decision Tree, Human-Knowledge Data Mining, Outlie r Data Reduction

[1]  Zhiqiang Zheng,et al.  On an incomplete data problem in modeling: evidence from web usage mining and a general purpose solution , 2003 .

[2]  Tu Bao Ho,et al.  A Scalable Algorithm for Rule Post-pruning of Large Decision Trees , 2001, PAKDD.

[3]  Yong Se Kim,et al.  A Outliers Analysis of Learner's Data based on User Interface Behaviors , 2007, Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007).

[4]  Peter Haider,et al.  Learning from incomplete data with infinite imputations , 2008, ICML '08.

[5]  Hongwei Zhang,et al.  Learning Bayesian network classifiers from data with missing values , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[6]  Sheng-yi Jiang,et al.  Clustering-Based Outlier Detection Method , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[7]  Ira Assent,et al.  OutRank: ranking outliers in high dimensional data , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[8]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[9]  Jingke Xi,et al.  Outlier Detection Algorithms in Data Mining , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[10]  Pre-Adjustment of Incomplete Group Variable via K-Means Clustering , 2004 .