论文信息 - Two approaches to gathering text corpora from the WorldWideWeb - 字舞流文

Two approaches to gathering text corpora from the WorldWideWeb

Sixteenth Annual Symposium of the Pattern Recognition Association of South Africa, Langebaan, South Africa, 23-25 November 2005

G Botha | E Barnard | E. Barnard | G. Botha

[1] Treebank Penn,et al. Linguistic Data Consortium , 1999 .

[2] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[3] Gökhan Dalkiliç,et al. Word statistics of Turkish language on a large scale text corpus - TurCo , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[4] Claire Waast-Richard,et al. A transformation-based learning approach to language identification for mixed-lingual text-to-speech synthesis , 2005, INTERSPEECH.

[5] Alex Acero,et al. Spoken Language Processing , 2001 .

[6] Michael Barlow,et al. Language model acquisition from a text corpus for speech understanding , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7] Hsinchun Chen,et al. Using Genetic Algorithm in Building Domain-Specific Collections: An Experiment in the Nanotechnology Domain , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[8] Mübeccel Demirekler,et al. On developing new text and audio corpora and speech recognition tools for the turkish language , 2002, INTERSPEECH.

[9] Bidyut Baran Chaudhuri,et al. Using text corpora for understanding polysemy in Bangla , 2002, Language Engineering Conference, 2002. Proceedings.