General Framework for Cluster based Active Learning Algorithm

This paper revisits the problem of active learning and decision making when the cost of labeling incurs cost and unlabeled data is available in abundance. In many real world applications large amounts of data are available but the cost of correctly labeling it prohibits its use. In many cases, where unlabeled data is available in abundance, active learning can be employed. In our proposed approach we will try to incorporate clustering into active learning algorithm and also data reduction is achieved through feature selection. The algorithm learns itself incrementally and will adjust clusters and select appropriate features as it explores more data points.

[1]  J. Juvik,et al.  Variation in sweet corn kernel characteristics associated with stand establishment and eating quality , 2004, Euphytica.

[2]  Matthias Seeger,et al.  Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[3]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Ian H. Witten,et al.  Clustering Documents with Active Learning Using Wikipedia , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[7]  Henri Maitre,et al.  Consensual clustering for unsupervised feature selection: application to SPOT5 satellite images indexing , 2008 .

[8]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[9]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[10]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[11]  Steve Hanneke,et al.  Teaching Dimension and the Complexity of Active Learning , 2007, COLT.

[12]  Hemant Joshi,et al.  UALR 06-02 : Using Active Learning with Integrated Feature Selection , 2007 .

[13]  Asymptotic Active Learning , 2007 .

[14]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[15]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.

[16]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[17]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[18]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[19]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[20]  Huan Liu,et al.  Customer Retention via Data Mining , 2000, Artificial Intelligence Review.

[21]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[22]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[23]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[24]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[25]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[26]  ChengXiang Zhai,et al.  Active Feedback - UIUC TREC-2003 HARD Experiments , 2003, TREC.

[27]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[28]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[29]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[30]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[31]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[32]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[33]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[34]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[35]  Carla E. Brodley,et al.  Visualization and interactive feature selection for unsupervised data , 2000, KDD '00.

[36]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[37]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[38]  Huan Liu,et al.  Feature Selection for Clustering , 2000, Encyclopedia of Database Systems.

[39]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[40]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[41]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[42]  Andrew Y. Ng,et al.  On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[43]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[44]  Joachim M. Buhmann,et al.  Active Data Clustering , 1997, NIPS.

[45]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[46]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[47]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[48]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[49]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[50]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[51]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[52]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[53]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[54]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[55]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[56]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[57]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.