论文信息 - A Language Identification Application Built on the Java Client / Server Platform

A Language Identification Application Built on the Java Client / Server Platform

We describe an experimental system implemented using the Java(TM) programming language which demonstrates a variety of application-level tradeoffs available to distributed natural language processing (NLP) applications. In the context of the World Wide Web (WWW), it is possible to provide value added functionality to legacy documents in a client side browser, a document server or an intermediary agent. Using a well-known ngram-based algorithm for automatic language identification, we have constructed a system to dynamically add language labels for whole documents and text fragments. We have experimented with several client/server configurations, and present the results of tradeoffs made between labelling accuracy and the size/completeness of the language models .

Philip Resnik | Gary Adams | P. Resnik | G. Adams

[1] Harald Tveit Alvestrand,et al. Tags for the Identification of Languages , 1995, RFC.

[2] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[3] Kenneth Ward Church,et al. - 1-What ’ s Wrong with Adding One ? , 1994 .

[4] Glenn Adams,et al. Internationalization of the Hypertext Markup Language , 1997, RFC.