Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

Classical supervised machine learning techniques have been explored for semantically annotating unstructured textual data such as consumers' comments archived at social media websites to extract business intelligence. However, these techniques often require a large number of manually labeled training examples to produce accurate annotations. Several active learning approaches that are designed based on probabilistic sequence models have been explored to minimize the number of labeled training examples for semantic annotation tasks. Recent research has shown that large-margin classifiers are viable alternatives to automated semantic annotation, given their strong generalization capabilities and the ability to process high-dimensional data. However, the existing active learning methods that are designed for probabilistic sequence models cannot be easily adapted and applied to large-margin classifiers. The main contribution of this paper is the development of novel active learning methods for large-margin cl...

[1]  Ming-Syan Chen,et al.  Selective data acquisition for probabilistic K-NN query , 2010, CIKM '10.

[2]  Diana Maynard,et al.  Metrics for Evaluation of Ontology-based Information Extraction , 2006, EON@WWW.

[3]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[4]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[5]  Siegfried Handschuh,et al.  Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[6]  Nello Cristianini,et al.  Further results on the margin distribution , 1999, COLT '99.

[7]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[8]  Nagiza F. Samatova,et al.  Multi-Criterion Active Learning in Conditional Random Fields , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[9]  Burr Settles,et al.  From Theories to Queries: Active Learning in Practice , 2011 .

[10]  Jeff A. Bilmes,et al.  Active Learning as Non-Convex Optimization , 2009, AISTATS.

[11]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[12]  Xiaoli Zhang,et al.  A Structural SVM Approach for Reference Parsing , 2010, ICMLA.

[13]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[14]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[15]  Ali F. Farhoomand,et al.  Managerial information overload , 2002, CACM.

[16]  Tomaso A. Poggio,et al.  Statistical Learning Theory: A Primer , 2000, International Journal of Computer Vision.

[17]  Daniel Jurafsky,et al.  Support Vector Learning for Semantic Argument Classification , 2005, Machine Learning.

[18]  Bo Zhang,et al.  Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction , 2008, J. Mach. Learn. Res..

[19]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[20]  Larry Reeve,et al.  Integrating Hidden Markov Models Into Semantic Web Annotation Platforms , 2004 .

[21]  Q. Henry Wu,et al.  Online training of support vector classifier , 2003, Pattern Recognit..

[22]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[23]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[24]  Foster J. Provost,et al.  Active Feature-Value Acquisition , 2009, Manag. Sci..

[25]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[26]  Raymond Y. K. Lau,et al.  Towards a belief-revision-based adaptive and context-sensitive information retrieval system , 2008, TOIS.

[27]  W. Scott Spangler,et al.  Generating and Browsing Multiple Taxonomies Over a Document Collection , 2003, J. Manag. Inf. Syst..

[28]  Jaideep Srivastava,et al.  Web Business Intelligence: Mining the Web for Actionable Knowledge , 2003, INFORMS J. Comput..

[29]  Bin Zhao,et al.  Max margin learning on domain-independent web information extraction , 2011, CIKM '11.

[30]  Zhiqiang Zheng,et al.  Selectively Acquiring Customer Information: A New Data Acquisition Problem and an Active Learning-Based Solution , 2006, Manag. Sci..

[31]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[32]  Michael W. Berry,et al.  Survey of Text Mining: Clustering, Classification, and Retrieval , 2007 .

[33]  Yagang Zhang,et al.  Application of Machine Learning , 2010 .

[34]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[35]  Stéphane Canu,et al.  Benchmarking of semantic annotation with conditional random fields , 2005 .

[36]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[37]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[38]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, OTM Confederated International Conferences, CoopIS, DOA, GADA, and ODBASE 2006, Montpellier, France, October 29 - November 3, 2006. Proceedings, Part I , 2006, OTM Conferences.

[39]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[40]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[41]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  Weiguo Fan,et al.  Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search , 2005, J. Manag. Inf. Syst..

[43]  Christoph H. Lampert,et al.  Active Structured Learning for High-Speed Object Detection , 2009, DAGM-Symposium.

[44]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[45]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[46]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[47]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[50]  Tom Heskes,et al.  Efficiently learning the preferences of people , 2012, Machine Learning.

[51]  Peter Spyns,et al.  Enhancing the Business Analysis Function with Semantics , 2006, OTM Conferences.

[52]  Fred A. Hamprecht,et al.  Structured Learning from Partial Annotations , 2012, ICML.

[53]  Reshma Khemchandani,et al.  Knowledge based proximal support vector machines , 2009, Eur. J. Oper. Res..

[54]  Emilio Carrizosa,et al.  Detecting relevant variables and interactions in supervised classification , 2011, Eur. J. Oper. Res..

[55]  Juan-Zi Li,et al.  Tree-Structured Conditional Random Fields for Semantic Annotation , 2006, International Semantic Web Conference.

[56]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[57]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[58]  Lawrence Carin,et al.  Cost-sensitive feature acquisition and classification , 2007, Pattern Recognit..

[59]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[60]  Kazushi Ikeda,et al.  Incremental support vector machines and their geometrical analyses , 2007, Neurocomputing.

[61]  Thorsten Joachims,et al.  Improved Learning of Structural Support Vector Machines: Training with Latent Variables and Nonlinear Kernels , 2011 .

[62]  Chih-Ping Wei,et al.  Managing Word Mismatch Problems in Information Retrieval: A Topic-Based Query Expansion Approach , 2007, J. Manag. Inf. Syst..

[63]  Gary Geunbae Lee,et al.  MMR-based Active Machine Learning for Bio Named Entity Recognition , 2006, NAACL.

[64]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[65]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[66]  Sung-Hyon Myaeng,et al.  Text Mining for Medical Documents Using a Hidden Markov Model , 2006, AIRS.

[67]  Jin-Dong Kim,et al.  The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[68]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[69]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[70]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[71]  Nuno Escudeiro,et al.  D-Confidence: An Active Learning Strategy which Efficiently Identifies Small Classes , 2010, HLT-NAACL 2010.

[72]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[73]  Hemant K. Bhargava,et al.  Feature Article - The World Wide Web: Opportunities for Operations Research and Management Science , 1998, INFORMS J. Comput..

[74]  Serpil Sayin,et al.  Using support vector machines to learn the efficient set in multiple objective discrete optimization , 2009, Eur. J. Oper. Res..

[75]  Michael W. Berry,et al.  Survey of Text Mining , 2003, Springer New York.

[76]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[77]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[78]  nhnguyen,et al.  Comparisons of Sequence Labeling Algorithms and Extensions , 2007 .

[79]  Balaji Padmanabhan,et al.  The identification and satisfaction of consumer analysis‐driven information needs of marketers on the WWW , 1998 .

[80]  Balaji Padmanabhan,et al.  On the Use of Optimization for Data Mining: Theoretical Interactions and eCRM Opportunities , 2003, Manag. Sci..

[81]  Mihai Surdeanu,et al.  Robust Information Extraction with Perceptrons , 2007 .

[82]  Xiaonan Li,et al.  Operations research and data mining , 2008, Eur. J. Oper. Res..

[83]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[84]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..