Aktives Lernen zur Klassifikation großer Datenmengen mittels Exploration und Spezialisierung

The paradigm of active learning is often used to classify large datasets with the help of a human expert in different application areas. The number of examples that need to be classified by the expert in order to build a stable model can be reduced with a selective sampling strategy. Current state-of-the-art active learning algorithms often deal insufficently with the aspect of exploration. They assume that a stable classification model has been already build and needs to be refined with further carefully selected examples. In this dissertation, two new approaches are introduced that include the aspect of exploration in active learning. In contrast to most of the other active learning methods, the selection strategy is applied from the very first example. After a stable classification model has been build, the selection strategy focuses on the classification boundaries. The first approach called ” Active Learning Vector Quantization“ explorates the data with a global clustering method. In a second phase, the human-classified clusters are further refined with selected examples. The second approach called ” Prototype Based Active Classification“ creates a new prototype for a k-nearest neighbour classification in each learning iteration. The selection of a data-point as a prototype depends on a combination of its representativeness of neighboring unlabeled data-points and the uncertainty of the classifier in predicting its class label. The proposed approach combines the trade-offs of exploration and exploitation via the newly developed uncertainty distribution seamlessly. With each learning iteration the effect of exploration reduces and exploitation increases. The practical application is demonstrated by a specific application in the field of bioinformatics. The strategy of first exploring the dataset and subsequently improving the classification model turns out to be beneficial for classification performance and classifier stability.

[1]  Polina Golland,et al.  Voronoi-Based Segmentation of Cells on Image Manifolds , 2005, CVBIA.

[2]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[3]  Toby Walsh,et al.  Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000 , 2000, ICML.

[4]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[5]  Mia Hubert,et al.  Integrating robust clustering techniques in S-PLUS , 1997 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[8]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[9]  E. Wong,et al.  Stochastic Processes in Engineering Systems , 1984 .

[10]  S. Menard Applied Logistic Regression Analysis , 1996 .

[11]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[12]  Hans C. Jessen,et al.  Applied Logistic Regression Analysis , 1996 .

[13]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[14]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[15]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[16]  Scott Kirkpatrick,et al.  Optimization by Simmulated Annealing , 1983, Sci..

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[19]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[20]  M. Teague Image analysis via the general theory of moments , 1980 .

[21]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.

[22]  Daniel P. W. Ellis,et al.  Support vector machine active learning for music retrieval , 2006, Multimedia Systems.

[23]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[24]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[25]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[26]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[27]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[28]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[29]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[30]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[31]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[32]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[33]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[34]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[35]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[36]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[37]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[38]  Joachim M. Buhmann,et al.  Active Data Clustering , 1997, NIPS.

[39]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[40]  Bogdan Gabrys,et al.  Combining labelled and unlabelled data in the design of pattern classification systems , 2004, Int. J. Approx. Reason..

[41]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[42]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[43]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[44]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[45]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[46]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[47]  Frank Y. Shih,et al.  An improved incremental training algorithm for support vector machines using active query , 2007, Pattern Recognit..

[48]  Raymond J. Mooney,et al.  Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[49]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[50]  Kwang Ryel Ryu,et al.  Using Cluster-Based Sampling to Select Initial Training Set for Active Learning in Text Classification , 2004, PAKDD.

[51]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[52]  Lei Wang,et al.  Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[53]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[54]  Stephen L. Chin An Efficient Method for Extracting Fuzzy Classification Rules from High Dimensional Data , 1997, J. Adv. Comput. Intell. Intell. Informatics.

[55]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[56]  Christian Borgelt,et al.  Effects of Irrelevant Attributes in Fuzzy Clustering , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[57]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[58]  Kun Deng,et al.  Balancing exploration and exploitation: a new algorithm for active machine learning , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[59]  George D. Magoulas,et al.  Extensions of the k Nearest Neighbour methods for classification problems , 2008 .

[60]  Anne-Kathrin Lauer Literaturverzeichnis. , 1935, Die Nichtangriffsverpflichtung im deutschen und europäischen Kartellrecht.

[61]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[62]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[63]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[64]  E. Mizutani,et al.  Neuro-Fuzzy and Soft Computing-A Computational Approach to Learning and Machine Intelligence [Book Review] , 1997, IEEE Transactions on Automatic Control.

[65]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[66]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[67]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[68]  von F. Zernike Beugungstheorie des schneidenver-fahrens und seiner verbesserten form, der phasenkontrastmethode , 1934 .

[69]  M. P. Windham Cluster validity for fuzzy clustering algorithms , 1981 .