论文信息 - Data Mining Using MLC a Machine Learning Library in C++

Data Mining Using MLC a Machine Learning Library in C++

Data mining algorithms including machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called MLC++, which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. MLC++ not only provides a workbench for such comparisons, but also provides a library of C++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers.

[1] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Small Sample Performance , 1952 .

[2] Irving John Good,et al. The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[3] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6] R. Michalski. A Planar Geometrical Model for Representing Multi-Dimensional Discrete Spaces and Multiple-Valued Logic Functions , 1978 .

[7] R. Olshen,et al. Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[8] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[10] R. Olshen,et al. Almost surely consistent nonparametric regression from recursive partitioning schemes , 1984 .

[11] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[12] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[14] Kurt Mehlhorn,et al. LEDA - A Library of Efficient Data Types and Algorithms , 1990, GI Jahrestagung.

[15] Matthew O. Ward,et al. Exploring N-dimensional databases , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[16] Sholom M. Weiss,et al. Computer Systems That Learn , 1990 .

[17] Belur V. Dasarathy,et al. Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[18] Thomas G. Dietterich,et al. Readings in Machine Learning , 1991 .

[19] Sebastian Thrun,et al. The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[20] Peter Clark,et al. Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[21] P. Langley,et al. An Analysis of Bayesian Classifiers , 1992, AAAI.

[22] Pat Langley,et al. An Analysis of Bayesian Classifiers , 1992, AAAI.

[23] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[24] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[25] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[26] David W. Aha,et al. Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[27] Igor Kononenko,et al. Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[28] Stefan Näher,et al. LEDA: A Library of Efficient Data Types and Algorithms , 1989, STACS.

[29] M. Perrone. Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[30] Janusz Wnek,et al. Hypothesis-driven constructive induction , 1993 .

[31] Bruce Tognazzini,et al. Quality, the road less traveled , 1994 .

[32] Michael J. Pazzani,et al. Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1993, J. Artif. Intell. Res..

[33] R. Palmer,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[34] Saso Dzeroski,et al. Inductive Logic Programming: Techniques and Applications , 1993 .

[35] Bjarne Stroustrup,et al. The Design and Evolution of C , 1994 .

[36] Ron Kohavi,et al. MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[37] J. R. Quinlan,et al. Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[38] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[39] Cullen Schaffer,et al. A Conservation Law for Generalization Performance , 1994, ICML.

[40] Simon Kasif,et al. A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[41] MethodsJ. R. QuinlanBasser. Comparing Connectionist and Symbolic Learning , 1994 .

[42] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[43] David J. Spiegelhalter,et al. Machine Learning, Neural and Statistical Classification , 2009 .

[44] Thomas G. Dietterich,et al. A study of distance-based machine learning algorithms , 1994 .

[45] Ron Kohavi,et al. Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[46] R. Tibshirani,et al. Cross-Validation and the Bootstrap : Estimating the Error Rate ofa Prediction , 1995 .

[47] Ron Kohavi,et al. Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[48] Scott Meyers,et al. More Effective C++: 35 New Ways to Improve Your Programs and Designs , 1995 .