Multi-View Hidden Markov Perceptrons

Discriminative learning techniques for sequential data have proven to be more effective than generative models for named entity recognition, information extraction, and other tasks of discrimination. However, semi-supervised learning mechanisms that utilize inexpensive unlabeled sequences in addition to few labeled sequences – such as the Baum-Welch algorithm – are available only for generative models. The multi-view approach is based on the principle of maximizing the consensus among multiple independent hypotheses; we develop this principle into a semisupervised hidden Markov perceptron algorithm. Experiments reveal that the resulting procedure utilizes unlabeled data effectively and discriminates more accurately than its purely supervised counterparts.

[1]  Ewan Klein,et al.  Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics , 2000, ACL 2000.

[2]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[3]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[4]  Thomas Hofmann,et al.  Discriminative Learning for Label Sequences via Boosting , 2002, NIPS.

[5]  Alexander J. Smola,et al.  Support vector machine learning , 2001, Tutorial Guide. ISCAS 2001. IEEE International Symposium on Circuits and Systems (Cat. No.01TH8573).

[6]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[7]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[8]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[11]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[12]  Fernando Pereira,et al.  Identifying gene and protein mentions in text using conditional random fields , 2005, BMC Bioinformatics.

[13]  Rayid Ghani,et al.  Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.

[14]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[15]  Steven P. Abney,et al.  Bootstrapping , 2002, ACL.

[16]  Thomas Hofmann,et al.  Gaussian process classification for segmenting and annotating sequences , 2004, ICML.

[17]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[18]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[19]  Ulf Leser,et al.  Systematic feature evaluation for gene name recognition , 2005, BMC Bioinformatics.

[20]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[21]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[22]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[23]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[24]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.