论文信息 - Social (distributed) language modeling, clustering and dialectometry

Social (distributed) language modeling, clustering and dialectometry

We present ongoing work in a scalable, distributed implementation of over 200 million individual language models, each capturing a single user's dialect in a given language (multilingual users have several models). These have a variety of practical applications, ranging from spam detection to speech recognition, and dialectometrical methods on the social graph. Users should be able to view any content in their language (even if it is spoken by a small population), and to browse our site with appropriately translated interface (automatically generated, for locales with little crowd-sourced community effort).

David Ellis

[1] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.

[2] Hermann Ney,et al. The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[3] A.P.J. van den Bosch,et al. Using language models for spam detection in social bookmarking , 2008 .

[4] Yong Yu,et al. Using social annotations to improve language model for information retrieval , 2007, CIKM '07.

[5] Hongyuan Zha,et al. Exploring social annotations for information retrieval , 2008, WWW.

[6] R. Mooney,et al. Impact of Similarity Measures on Web-page Clustering , 2000 .

[7] David Ellis. A Case Study in Community-Driven Translation of a Fast-Changing Website , 2009, HCI.

[8] Eugene Charniak,et al. Creating Algorithms for Parsers and Taggers for Resource-Poor Languages Using a Related Resource-Rich Language , 2006 .

[9] Gerhard Weikum,et al. Social Wisdom for Search and Recommendation , 2008, IEEE Data Eng. Bull..