TREATMENT LEARNING: IMPLEMENTATION AND APPLICATION

Data mining and machine learning focus on inducing previously unknown, potentially useful, and ultimately understandable information from data. In this master's thesis, we propose a new learning approach called treatment learning. Treatment learning aims at mining a small number of control variables in a large option space that can lead to better system behavior. It addresses two central issues in data mining: (1) the understandability of learnt theories; (2) how can the learnt theories benefit decision making. We design and implement a novel mining algorithm and deliver two treatment learners that are freely downloadable from an online distribution. We describe the implementation details of both learners and compare them through algorithmic performance analysis. We conduct extensive data experiments and case studies to demonstrate the effectiveness of using treatment learner to seek a small number of control variables that constrain the option space to a tight, near-optimal convergence. We compare treatment learning with other learning schemes in the framework of feature subset selection for supervised classification. Our treatment learner selects smaller feature subsets than most other methods with minimal or no loss in classification accuracy. Treatment learner has been successfully ii applied to various research domains through a collaboration with other researchers. By presenting four examples, we show the general paradigms of using it for decision making.

[1]  Tim Menzies,et al.  Model-based tests of truisms , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[2]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[3]  David Raffo,et al.  Modeling software processes quantitatively and assessing the impact of potential process changes on process performance , 1996 .

[4]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[5]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  James M. Crawford,et al.  Experimental Results on the Application of Satisfiability Algorithms to Scheduling Problems , 1994, AAAI.

[8]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[10]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[11]  Agile Manifesto,et al.  Manifesto for Agile Software Development , 2001 .

[12]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[13]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[14]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[15]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[16]  Ke Wang,et al.  Mining confident rules without support requirement , 2001, CIKM '01.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Michael E. Fagan Advances in software inspections , 1986, IEEE Transactions on Software Engineering.

[19]  T. Menzies,et al.  Metrics that matter , 2002, 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings..

[20]  Bashar Nuseibeh,et al.  An empirical investigation of multiple viewpoint reasoning in requirements engineering , 1999, Proceedings IEEE International Symposium on Requirements Engineering (Cat. No.PR00188).

[21]  Bojan Cukic,et al.  What makes finite-state models more (or less) testable? , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[22]  Kent L. Beck,et al.  Extreme programming explained - embrace change , 1990 .

[23]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[24]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[26]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[27]  Bojan Cukic,et al.  Saturation effects in testing of formal models , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[28]  Ke Wang,et al.  Growing decision trees on support-less association rules , 2000, KDD '00.

[29]  Tim Menzies,et al.  Reusing Models For Requirements Engineering , 2001 .

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[31]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[32]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[33]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[34]  John Mingers,et al.  An empirical comparison of selection measures for decision-tree induction , 2004, Machine Learning.

[35]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[36]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[37]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[38]  Darrel C. Ince,et al.  A critique of three metrics , 1994, J. Syst. Softw..

[39]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[40]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[41]  Tim Menzies,et al.  Condensing Uncertainty via Incremental Treatment Learning , 2003 .

[42]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[43]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[44]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[45]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[46]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[47]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[48]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[49]  Tim Menzies,et al.  Converging on the optimal attainment of requirements , 2002, Proceedings IEEE Joint International Conference on Requirements Engineering.

[50]  Geoffrey I. Webb Efficient search for association rules , 2000, KDD '00.

[51]  Tim Menzies,et al.  How to Argue Less , 2001 .

[52]  J. Davies,et al.  Hierarchical categorization and the effects of contrast inconsistency in an unsupervised learning task , 1996 .

[53]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[54]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[55]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[56]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[57]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[58]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1993 .

[59]  David W. Aha,et al.  Special Issue on Lazy Learning , 1997 .

[60]  J. R. Quilan Decision trees and multi-valued attributes , 1988 .

[61]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[62]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[63]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[64]  Tim Menzies,et al.  Constraining discussions in requirements engineering , 2001 .

[65]  Matthias M. Mueller,et al.  Extreme programming from an engineering economics viewpoint , 2002 .

[66]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[67]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[68]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.