Two approaches to gathering text corpora from the WorldWideWeb

Sixteenth Annual Symposium of the Pattern Recognition Association of South Africa, Langebaan, South Africa, 23-25 November 2005

[1]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[2]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[3]  Gökhan Dalkiliç,et al.  Word statistics of Turkish language on a large scale text corpus - TurCo , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[4]  Claire Waast-Richard,et al.  A transformation-based learning approach to language identification for mixed-lingual text-to-speech synthesis , 2005, INTERSPEECH.

[5]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[6]  Michael Barlow,et al.  Language model acquisition from a text corpus for speech understanding , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Hsinchun Chen,et al.  Using Genetic Algorithm in Building Domain-Specific Collections: An Experiment in the Nanotechnology Domain , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[8]  Mübeccel Demirekler,et al.  On developing new text and audio corpora and speech recognition tools for the turkish language , 2002, INTERSPEECH.

[9]  Bidyut Baran Chaudhuri,et al.  Using text corpora for understanding polysemy in Bangla , 2002, Language Engineering Conference, 2002. Proceedings.