Language Identification With Confidence Limits
暂无分享,去创建一个
A statistical classification algorithm and its application to language identification from noisy input are described. The main innovation is to compute confidence limits on the classification, so that the algorithm terminates when enough evidence to make a clear decision has been made, and so avoiding problems with categories that have similar characteristics. A second application, to genre identification, is briefly examined. The results show that some of the problems of other language identification techniques can be avoided, and illustrate a more important point: that a statistical language process can be used to provide feedback about its own success rate.
[1] Ted E. Dunning,et al. Statistical Identification of Language , 1994 .
[2] Virginia P. Collier,et al. Two Languages Are Better Than One. , 1998 .
[3] Penelope Sibun,et al. Language Determination: Natural Language Processing from Scanned Document Images , 1994, ANLP.
[4] Philip Resnik,et al. A Language Identification Application Built on the Java Client / Server Platform , 1997 .
[5] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .