Extracting Parallel Paragraphs from Common Crawl
暂无分享,去创建一个
[1] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[2] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[3] Marcin Junczys-Dowmunt,et al. SyMGiza++: Symmetrized Word Alignment Models for Statistical Machine Translation , 2011, SIIS.
[4] Philipp Koehn,et al. Dirt Cheap Web-Scale Parallel Text from the Common Crawl , 2013, ACL.
[5] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.
[6] Brian Quigley,et al. Synthesis Digital Library of Engineering and Computer Science. , 2009, Issues in Science and Technology Librarianship.
[7] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[8] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[9] Stephan Vogel,et al. Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.
[10] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[11] Miquel Espl,et al. Bitextor, a free/open-source software to harvest translation memories from multilingual websites , 2009 .
[12] Gareth J. F. Jones,et al. Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval , 2016, ArXiv.
[13] W. Bruce Croft,et al. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .
[14] Christopher D. Manning,et al. Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.
[15] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[16] Andy Way,et al. FaDA: Fast Document Aligner using Word Embedding , 2016, Prague Bull. Math. Linguistics.
[17] Ondrej Dusek,et al. The Joy of Parallelism with CzEng 1.0 , 2012, LREC.
[18] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[19] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[20] Noah A. Smith,et al. The Web as a Parallel Corpus , 2003, CL.