论文信息 - Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce - 字舞流文

Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce

We present an open-source framework for large-scale online structured learning. Developed with the flexibility to handle cost-augmented inference problems such as statistical machine translation (SMT), our large-margin learner can be used with any decoder. Integration with MapReduce using Hadoop streaming allows efficient scaling with increasing size of training data. Although designed with a focus on SMT, the decoder-agnostic design of our learner allows easy future extension to other structured learning problems such as sequence labeling and parsing.

Jimmy J. Lin | Vladimir Eidelman | Philip Resnik | Ferhan Türe | Ke Wu | P. Resnik | Vladimir Eidelman | Ferhan Ture | Ke Wu

[1] Koby Crammer,et al. Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[2] Koby Crammer,et al. Adaptive regularization of weight vectors , 2009, Machine Learning.

[3] Vladimir Eidelman,et al. cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[4] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[6] Philipp Koehn,et al. SampleRank Training for Phrase-Based Machine Translation , 2011, WMT@EMNLP.

[7] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8] Philipp Koehn,et al. Analysing the Effect of Out-of-Domain Data on SMT Systems , 2012, WMT@NAACL-HLT.

[9] Michael Collins,et al. Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[10] Christopher D. Manning,et al. Fast and Adaptive Online Training of Feature-Rich Translation Models , 2013, ACL.

[11] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[12] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[13] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14] Kevin Knight,et al. 11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[15] Chris Dyer,et al. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT , 2012, ACL.

[16] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17] David Chiang,et al. Hope and Fear for Discriminative Training of Statistical Translation Models , 2012, J. Mach. Learn. Res..

[18] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[20] Giorgio Satta,et al. Guided Learning for Bidirectional Sequence Classification , 2007, ACL.