Machine Learning: ECML 2000

Overfitting is often considered the central problem in machine learning and data mining. When good performance on training data is not enough to reliably predict good generalization, researchers and practitioners often invoke ”Occam’s razor” to select among hypotheses: prefer the simplest hypothesis consistent with the data. Occam’s razor has a long history in science, but a mass of recent evidence suggests that in most cases it is outperformed by methods that deliberately produce more complex models. The poor performance of Occam’s razor can be largely traced to its failure to account for the search process by which hypotheses are obtained: by effectively assuming that the hypothesis space is exhaustively searched, complexity-based methods tend to over-penalize large spaces. This talk describes how information about the search process can be taken into account when evaluating hypotheses. The expected generalization error of a hypothesis is computed as a function of the search steps leading to it. Two variations of this ”processoriented” approach have yielded significant improvements in the accuracy of a rule learner. Process-oriented evaluation leads to the seemingly paradoxical conclusion that the same hypothesis will have different expected generalization errors depending on how it was generated. I believe that this is as it should be, and that a corresponding shift in our way of thinking about inductive learning is required.

[1]  Donald Michie,et al.  Man-Machine Co-operation on a Learning Task , 1969 .

[2]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[3]  Richard K. Belew,et al.  New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[6]  Pedro M. Domingos Occam's Two Razors: The Sharp and the Blunt , 1998, KDD.

[7]  Jordan B. Pollack,et al.  Co-Evolution in the Successful Learning of Backgammon Strategy , 1998, Machine Learning.

[8]  Fredrik A. Dahl,et al.  On Classification of Games and Evaluation of Players - with Some Sweeping Generalizations About the Literature , 1999 .

[9]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[10]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[11]  Tu Bao Ho,et al.  An Approach to Concept Formation Based on Formal Concept Analysis , 1995, IEICE Trans. Inf. Syst..

[12]  Pat Langley,et al.  Average-Case Analysis of a Nearest Neighbor Algorithm , 1993, IJCAI.

[13]  Ralph L. Day,et al.  Modeling Choices Among Alternative Responses to Dissatisfaction , 1984 .

[14]  Vladimir Vovk,et al.  Universal portfolio selection , 1998, COLT' 98.

[15]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[16]  Paul Dagum,et al.  Time series prediction using belief network models , 1995, Int. J. Hum. Comput. Stud..

[17]  João Gama,et al.  Probabilistic Linear Tree , 1997, ICML.

[18]  O. Ozdamar,et al.  Automated auditory brainstem response interpretation , 1994, IEEE Engineering in Medicine and Biology Magazine.

[19]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[20]  Maurice Bruynooghe,et al.  A Framework for Defining Distances Between First-Order Logic Objects , 1998, ILP.

[21]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[22]  Dieter Merkl,et al.  A learning component for workflow management systems , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[23]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[24]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[25]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[26]  Joachim Herbst Inducing Workflow Models from Workflow Instances , 1999 .

[27]  Stefan Wess,et al.  Case-Based Reasoning Technology: From Foundations to Applications , 1998, Lecture Notes in Computer Science.

[28]  Andrée Borillo Exploration automatisée de textes de spécialité : repérage et identification de la relation lexicale d'hyperonymie , 1996 .

[29]  Thorsten Joachims,et al.  Estimating the Expected Error of Empirical Minimizers for Model Selection , 1998, AAAI/IAAI.

[30]  Jean-Gabriel Ganascia,et al.  Conceptual Clustering of Complex Objects: A Generalization Space based Approach , 1995, ICCS.

[31]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[32]  Rokia Missaoui,et al.  An Incremental Concept Formation Approach for Learning from Databases , 1994, Theor. Comput. Sci..

[33]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[34]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[35]  Dana Ron,et al.  An Experimental and Theoretical Comparison of Model Selection Methods , 1995, COLT '95.

[36]  C. Wargitsch,et al.  WorkBrain: Merging Organizational Memory and Workflow Management Systems , 1997 .

[37]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[38]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[39]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[40]  Robert K. Cunningham,et al.  Results of the DARPA 1998 Offline Intrusion Detection Evaluation , 1999, Recent Advances in Intrusion Detection.

[41]  David Leake,et al.  Case-Based Reasoning: Experiences, Lessons and Future Directions , 1996 .

[42]  Fabrizio Luccio,et al.  Simple and Efficient String Matching with k Mismatches , 1989, Inf. Process. Lett..

[43]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[44]  Andreas Herrmann,et al.  Customer Retention in the Automotive Industry , 1997 .

[45]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[46]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[47]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.

[48]  高橋 俊雄,et al.  留学記 University of California,Irvine , 2003 .

[49]  Ljup Co Todorovski,et al.  Experiments in Meta-level Learning with Ilp , 1999 .

[50]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[51]  Patricia Rodriguez-Tomé,et al.  The European Bioinformatics Institute (EBI) databases , 1994, Nucleic Acids Res..

[52]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[53]  Nir Friedman,et al.  Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting , 1998, ICML.

[54]  David D. Jensen,et al.  A Family of Algorithms for Finding Temporal Structure in Data , 1997 .

[55]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[56]  Peter Auer,et al.  Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[57]  Scott B. Huffman,et al.  Learning information extraction patterns from examples , 1995, Learning for Natural Language Processing.

[58]  Luís Torgo,et al.  Regression Using Classification Algorithms , 1997, Intell. Data Anal..

[59]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[60]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[61]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[62]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[63]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[64]  Manuela M. Veloso,et al.  Layered Approach to Learning Client Behaviors in the Robocup Soccer Server , 1998, Appl. Artif. Intell..

[65]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[66]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[67]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[68]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[69]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[70]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[71]  Boris Chidlovskii,et al.  Towards Sophisticated Wrapping of Web-based information Repositories , 1997, RIAO.

[72]  Kai Ming Ting,et al.  An Empirical Study of MetaCost Using Boosting Algorithms , 2000, ECML.

[73]  Ivan Bratko,et al.  Skill Reconstruction as Induction of LQ Controllers with Subgoals , 1997, IJCAI.

[74]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[75]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[76]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[77]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[78]  Jennifer Widom,et al.  Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[79]  Viggo Kann,et al.  Polynomially Bounded Minimization Problems That Are Hard to Approximate , 1993, Nord. J. Comput..

[80]  Yishay Mansour,et al.  A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization , 1998, ICML.

[81]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[82]  Pat Langley,et al.  Tractable Average-Case Analysis of Naive Bayesian Classifiers , 1999, ICML.

[83]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[84]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[85]  Pedro M. Domingos Process-Oriented Estimation of Generalization Error , 1999, IJCAI.

[86]  James A. Hendler,et al.  Co-evolving Soccer Softbot Team Coordination with Genetic Programming , 1997, RoboCup.

[87]  Benjamin Kuipers,et al.  Commonsense Reasoning about Causality: Deriving Behavior from Structure , 1984, Artif. Intell..

[88]  Kitsana Waiyamai,et al.  Querying Concept Lattices in Object Databases , 1998, IADT.

[89]  M. Collares-Pereira,et al.  Simple procedure for generating sequences of daily radiation values using a library of Markov transition matrices , 1988 .

[90]  Ken Samuel,et al.  Lazy Transformation-Based Learning , 1998, FLAIRS.

[91]  E. Nadaraya On Estimating Regression , 1964 .

[92]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[93]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[94]  Steven L. Salzberg,et al.  Learning with Nested Generalized Exemplars , 1990 .

[95]  D. Higgins,et al.  Finding flexible patterns in unaligned protein sequences , 1995, Protein science : a publication of the Protein Society.

[96]  Michael Schatz,et al.  Learning Program Behavior Profiles for Intrusion Detection , 1999, Workshop on Intrusion Detection and Network Monitoring.

[97]  Padraig Cunningham,et al.  The Utility Problem Analysed: A Case-Based Reasoning Perspective , 1996, EWCBR.

[98]  Werner Emde Inductive Learning of Characteristic Concept Description from Small Sets of Classified Examples , 1994, ECML.

[99]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[100]  Filippo Neri,et al.  Search-Intensive Concept Induction , 1995, Evolutionary Computation.

[101]  Marlon Núñez The use of background knowledge in decision tree induction , 2004, Machine Learning.

[102]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[103]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[104]  Louis Wehenkel,et al.  Automatic Learning Techniques in Power Systems , 1997 .

[105]  F. Y. Edgeworth,et al.  The theory of statistics , 1996 .

[106]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[107]  P. Brazdil Data Transformation and Model Selection by Experimentation and Meta-learning 1 Model Selection by Experimentation or Using Meta-knowledge? 1.1 Model Selection by Experimentation , 1998 .

[108]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[109]  MiningChun-Nan Hsu Finite-state Transducers for Semi-structured Text Mining , 1999 .

[110]  Chun-Nan Hsu,et al.  Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web , 1998, Inf. Syst..

[111]  Barry Smyth,et al.  Adaptation-Guided Retrieval: Questioning the Similarity Assumption in Reasoning , 1998, Artif. Intell..

[112]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[113]  Ivan Bratko,et al.  Modelling of control skill by qualitative constraints , 2003 .

[114]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[115]  Alan Hutchinson,et al.  Metrics on Terms and Clauses , 1997, ECML.

[116]  Carla E. Brodley,et al.  Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection , 1993 .

[117]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[118]  Qiang Yang,et al.  Remembering to Add: Competence-preserving Case-Addition Policies for Case Base Maintenance , 1999, IJCAI.

[119]  Wei Zhang An Region-Based Learning Approach to Discovering Temporal Structures in Data , 1999, ICML.

[120]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[121]  Bernhard Pfahringer,et al.  Compression-Based Discretization of Continuous Attributes , 1995, ICML.

[122]  Jean-Marc Andreoli,et al.  The Constraint-Based Knowledge Broker system , 1997, Proceedings 13th International Conference on Data Engineering.

[123]  David S. Day,et al.  Finite-state phrase parsing by rule sequences , 1996, COLING.

[124]  Filippo Neri,et al.  Exploring the Power of Genetic Search in Learning Symbolic Classifiers , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[125]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[126]  Terry R. Payne Dimensionality reduction and representation for nearest neighbour learning , 1999 .

[127]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[128]  Michael I. Jordan A statistical approach to decision tree modeling , 1994, COLT '94.

[129]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[130]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[131]  Padraig Cunningham,et al.  Using Introspective Learning to Improve Retrieval in CBR: A Case Study in Air Traffic Control , 1997, ICCBR.

[132]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[133]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[134]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[135]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[136]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[137]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[138]  Bernard Monjardet,et al.  Metrics on partially ordered sets - A survey , 1981, Discret. Math..

[139]  Tony R. Martinez,et al.  BRACE: A Paradigm For the Discretization of Continuously Valued Data , 1994 .

[140]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[141]  Ryszard Tadeusiewicz,et al.  Processing and classification of deformed speech using neural networks , 1999, Proceedings of the First Joint BMES/EMBS Conference. 1999 IEEE Engineering in Medicine and Biology 21st Annual Conference and the 1999 Annual Fall Meeting of the Biomedical Engineering Society (Cat. N.

[142]  Kristin P. Bennett,et al.  Combining support vector and mathematical programming methods for classification , 1999 .

[143]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[144]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[145]  R. Wille Concept lattices and conceptual knowledge systems , 1992 .

[146]  John J. Grefenstette,et al.  A Coevolutionary Approach to Learning Sequential Decision Rules , 1995, ICGA.

[147]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[148]  Mario Lenz,et al.  Case-Based Reasoning: Survey and Future Directions , 1999, XPS.

[149]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[150]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[151]  Marek Kretowski,et al.  An Evolutionary Algorithm Using Multivariate Discretization for Decision Rule Induction , 1999, PKDD.

[152]  Jorma Rissanen,et al.  Stochastic Complexity in Learning , 1995, J. Comput. Syst. Sci..

[153]  Jonathan Schaeffer,et al.  Learning to Play Strong Poker , 1999, ICML 1999.

[154]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[155]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[156]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[157]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[158]  João Gama,et al.  Characterization of Classification Algorithms , 1995, EPIA.

[159]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[160]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[161]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[162]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[163]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[164]  Alexander Gammerman,et al.  Complexity Approximation Principle , 1999, Comput. J..

[165]  Bill Curtis,et al.  Process modeling , 1992, CACM.

[166]  Isabelle Guyon,et al.  On-line cursive script recognition using time-delay neural networks and hidden Markov models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[167]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[168]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[169]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[170]  Nicolas Lachiche,et al.  Scope Classification: An Instance-Based Learning Algorithm with a Rule-Based Characterisation , 1998, ECML.

[171]  Shlomo Argamon,et al.  A Memory-Based Approach to Learning Shallow Natural Language Patterns , 1998, ACL.

[172]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[173]  Emmanuelle Martienne,et al.  Learning Logical Descriptions for Document Understanding: A Rough Sets-Based Approach , 1998, Rough Sets and Current Trends in Computing.

[174]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[175]  Rajesh Parekh,et al.  Automata Induction, Grammar Inference, and Language Acquisition , 2000 .

[176]  Serguei V. S. Pakhomov Modeling Filled Pauses in Medical Dictations , 1999, ACL.

[177]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[178]  Barry Smyth,et al.  Remembering To Forget: A Competence-Preserving Case Deletion Policy for Case-Based Reasoning Systems , 1995, IJCAI.

[179]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[180]  Barry Smyth,et al.  Footprint-Based Retrieval , 1999, ICCBR.

[181]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[182]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[183]  Eddy Mayoraz,et al.  Improved Pairwise Coupling Classification with Correcting Classifiers , 1998, ECML.

[184]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[185]  Azriel Rosenfeld,et al.  Grammatical inference by hill climbing , 1976, Inf. Sci..

[186]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[187]  Ted Pedersen,et al.  Knowledge Lean Word-Sense Disambiguation , 1997, AAAI/IAAI.

[188]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[189]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[190]  Ursula Gather,et al.  Analysis of High Dimensional Data from Intensive Care Medicine , 1998, COMPSTAT.

[191]  Paul Wang,et al.  Applications for the lifetime value model in modern newspaper publishing , 1995 .

[192]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[193]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[194]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[195]  Barry Smyth,et al.  Building Compact Competent Case-Bases , 1999, ICCBR.

[196]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[197]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[198]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[199]  Dimitris Karagiannis,et al.  Integrating machine learning and workflow management to support acquisition and adaptation of workflow models , 2000, Intell. Syst. Account. Finance Manag..

[200]  S. Pattinson,et al.  Learning to fly. , 1998 .

[201]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[202]  L. Darrell Whitley,et al.  Genetic Approach to Feature Selection for Ensemble Creation , 1999, GECCO.

[203]  Dimitris Meretakis,et al.  Classification as Mining and Use of Labeled Itemsets , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[204]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[205]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[206]  Adele E. Howe,et al.  Modelling Discrete Event Sequences as State Transition Diagrams , 1997, IDA.

[207]  Guijun Wang,et al.  ProFusion*: Intelligent Fusion from Multiple, Distributed Search Engines , 1996, J. Univers. Comput. Sci..

[208]  David B. Searls,et al.  Linguistic approaches to biological sequences , 1997, Comput. Appl. Biosci..

[209]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[210]  Aram Karalic,et al.  Employing Linear Regression in Regression Tree Leaves , 1992, ECAI.

[211]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[212]  Guergana Savova,et al.  Filled Pause Distribution and Modeling in Quasi-Spontaneous Speech , 2002 .

[213]  Douglas E. Appelt,et al.  FASTUS: A System for Extracting Information from Natural-Language Text , 1992 .

[214]  Michael W. Berry,et al.  Low-rank Orthogonal Decompositions for Information Retrieval Applications , 1995, Numer. Linear Algebra Appl..

[215]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[216]  Yoram Singer,et al.  Boosting Applied to Tagging and PP Attachment , 1999, EMNLP.

[217]  L. Wehenkel On uncertainty measures used for decision tree induction , 1996 .

[218]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[219]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[220]  Cathy H. Wu,et al.  Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition , 1995, Machine Learning.

[221]  David C. Wilson,et al.  Categorizing Case-Base Maintenance: Dimensions and Directions , 1998, EWCBR.

[222]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[223]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[224]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[225]  Olivier Gascuel,et al.  Hidden Markov Models with Patterns to Learn Boolean Vector Sequences and Application to the Built-In Self-Test for Integrated Circuits , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[226]  Jonathan Schaeffer,et al.  Poker as a Testbed for Machine Intelligence Research , 1998 .

[227]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[228]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[229]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[230]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[231]  Craig A. Knoblock,et al.  Wrapper generation for semi-structured Internet sources , 1997, SGMD.

[232]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[233]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[234]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[235]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[236]  Johan de Kleer,et al.  A Qualitative Physics Based on Confluences , 1984, Artif. Intell..

[237]  Sholom M. Weiss,et al.  Small Sample Decision tree Pruning , 1994, ICML.

[238]  Barry Smyth,et al.  Modelling the Competence of Case-Bases , 1998, EWCBR.

[239]  K. McConway Distribution-free Tests, H.R. Neave, P.L. Worthington. Unwin Hyman, London (1988), xvi, +430. Price £40.00 hardback, £14.95 paperback , 1989 .

[240]  寺野 隆雄 Quantitative Results Concerning the Utility of Explanation-Based Learning , 1989 .

[241]  L. Breiman Arcing Classifiers , 1998 .

[242]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[243]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[244]  D. Hofstadter Metamagical Themas: Questing for the Essence of Mind and Pattern , 1985 .

[245]  Christian Homburg,et al.  Cross-Validation and Information Criteria in Causal Modeling , 1991 .

[246]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[247]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[248]  Salvatore J. Stolfo,et al.  Toward Cost-Sensitive Modeling for Intrusion Detection , 2000 .

[249]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[250]  Chris Carter,et al.  Assessing Credit Card Applications Using Machine Learning , 1987, IEEE Expert.

[251]  Janet L. Kolodner,et al.  Reconstructive Memory: A Computer Model , 1983, Cogn. Sci..

[252]  Agnar Aamodt,et al.  Case-Based Reasoning Research and Development , 1995, Lecture Notes in Computer Science.

[253]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[254]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[255]  Astro Teller,et al.  Evolving Team Darwin United , 1998, RoboCup.

[256]  Peter J. Angeline,et al.  Competitive Environments Evolve Better Solutions for Complex Tasks , 1993, ICGA.

[257]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[258]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[259]  Francesco Bergadano,et al.  Inductive Logic Programming: From Machine Learning to Software Engineering , 1995 .

[260]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[261]  James C. Bezdek,et al.  Semi-supervised Point Prototype Clustering , 1998, Int. J. Pattern Recognit. Artif. Intell..

[262]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[263]  Yoram Biberman,et al.  A Context Similarity Measure , 1994, ECML.

[264]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[265]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[266]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[267]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[268]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[269]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[270]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[271]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[272]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[273]  Luis Talavera,et al.  Feature Selection as Retrospective Pruning in Hierarchical Clustering , 1999, IDA.

[274]  Vivian BorstDepartment Unsupervised Clustering : A Fast Scalable Method forLarge Datasets , 1999 .

[275]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[276]  Barry Smyth,et al.  Case-Base Maintenance , 1998, IEA/AIE.

[277]  Avi Pfeffer,et al.  Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[278]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[279]  Oscar H. Ibarra,et al.  Polynomially Complete Fault Detection Problems , 1975, IEEE Transactions on Computers.

[280]  Kai Ming Ting,et al.  Boosting Cost-Sensitive Trees , 1998, Discovery Science.

[281]  Matthew Miller,et al.  Learning Cost-Sensitive Classification Rules for Network Intrusion Detection using RIPPER , 1999 .

[282]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[283]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[284]  Ellen Riloff,et al.  Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing , 1996, Lecture Notes in Computer Science.

[285]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[286]  Wenke Lee,et al.  A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems , 1999 .

[287]  E. Morin Extraction de liens semantiques entre termes a partir de corpus de textes techniques , 1999 .

[288]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[289]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[290]  Llanos Mora-López,et al.  Characterization and simulation of hourly exposure series of global radiation , 1997 .

[291]  Alexandros Kalousis,et al.  NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection , 1999, Intell. Data Anal..

[292]  João José Furtado Vasco,et al.  Determining Property Relevance in Concept Formation by Computing Correlation Between Properties , 1998, ECML.

[293]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[294]  Terran Lane,et al.  An Application of Machine Learning to Anomaly Detection , 1999 .

[295]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[296]  Kenneth D. Forbus Qualitative Process Theory , 1984, Artif. Intell..

[297]  Javier Bejar Alonso Adquisición de conocimiento en dominios poco estructurados , 1995 .

[298]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[299]  Frederick Reichheld,et al.  Building high-loyalty business systems , 1993 .

[300]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[301]  W. Härdle Applied Nonparametric Regression , 1992 .

[302]  Ethem Alpaydin,et al.  Support Vector Machines for Multi-class Classification , 1999, IWANN.

[303]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[304]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[305]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[306]  Kenji Fukumizu,et al.  Generalization Error of Limear Neural Networks in Unidentifiable Cases , 1999, ALT.

[307]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[308]  Alexander L. Wolf,et al.  Event-based detection of concurrency , 1998, SIGSOFT '98/FSE-6.

[309]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[310]  B. Morris The Service Profit Chain: : How Leading Companies Link Profit and Growth to Loyalty, Satisfaction, and Value , 1998 .

[311]  Pedro M. Domingos A Process-Oriented Heuristic for Model Selection , 1998, ICML.

[312]  Nada Lavrac,et al.  Cost-Sensitive Feature Reduction Applied to a Hybrid Genetic Algorithm , 1996, ALT.

[313]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[314]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[315]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[316]  Hong-Yeop Song,et al.  A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[317]  Manuela M. Veloso,et al.  The CMUnited-98 Champion Simulator Team , 1998, RoboCup.

[318]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[319]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[320]  Dana Ron,et al.  Learning probabilistic automata with variable memory length , 1994, COLT '94.

[321]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[322]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[323]  Alexander L. Wolf,et al.  Automating Process Discovery through Event-Data Analysis , 1995, 1995 17th International Conference on Software Engineering.

[324]  LiMin Fu,et al.  Neural networks in computer intelligence , 1994 .

[325]  Mario Lenz,et al.  Case Retrieval Nets: Basic Ideas and Extensions , 1996, KI.

[326]  Salvatore J. Stolfo,et al.  Mining in a data-flow environment: experience in network intrusion detection , 1999, KDD '99.

[327]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[328]  Kazuya Takeda,et al.  Models and analysis of vocal emissions for biomedical applications : 3rd International workshop ... , 2003 .

[329]  Francisco Casacuberta Some Relations Among Stochastic Finite State Networks Used in Automatic Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[330]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[331]  Thomas G. Dietterich,et al.  A Comparative Study of ID3 and Backpropagation for English Text-to-Speech Mapping , 1990, ML.

[332]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[333]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[334]  Steven J. Fenves,et al.  Applying AI clustering to engineering tasks , 1993, IEEE Expert.

[335]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[336]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[337]  Ivan Bratko,et al.  Reconstructing Human Skill with Machine Learning , 1994, European Conference on Artificial Intelligence.

[338]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[339]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[340]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[341]  Darrell Whitley,et al.  Feature Selection Mechanisms for Ensemble Creation : A Genetic Search Perspective , 2003 .

[342]  L. Berkovitz The Tactical Air Game: A Multimove Game with Mixed Strategy Solution , 1975 .

[343]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[344]  Shan-Hwei Nienhuys-Cheng Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept , 1997, ILP.

[345]  Hiroaki Kitano,et al.  RoboCup-97: Robot Soccer World Cup I , 1998, Lecture Notes in Computer Science.

[346]  Padraig Cunningham,et al.  The NeuralBAG algorithm: optimizing generalization performance in bagged neural networks , 1999, ESANN.

[347]  Ivan Bratko,et al.  Symbolic and qualitative reconstruction of control skill , 1999, Electronic Transactions on Artifical Intelligence.

[348]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[349]  Thorsten Joachims,et al.  Expected Error Analysis for Model Selection , 1999, ICML.