A support vector machines approach to vietnamese key phrase extraction

Ho Chi Minh University of Industry, HCMC, 12 Nguyen Van Bao St, Go Vap Dist, Viet Nam; HCMCUniversity of Technology, HCMC, 268 Ly Thuong Kiet St, Dist 10, Viet Nam Abstract: Automatic key phrase extraction is the task of automatically selecting a set of phrases that describethe content of a simple sentence. That a key phrase is extracted means that it is present verbatim in thesentence to which it is assigned. Accurate key phrase extraction is fundamental to the success of manyrecent digital library applications, clustering, and semantic information retrieval techniques. The presentresearch discusses a support vector machines (SVMs) approach for Vietnamese key phrase extraction andpresents a number of experiments in which performance is incrementally improved. In general, theVietnamese key phrase extracting process consists of three steps: word segmentation for identifying lexicalunits in an input sentence, part-of-speech tagging for words, and key phrase extraction for phrases. Theperformance of Vietnamese key phras extraction systems is generally measured by the precision rateattained. This depends strongly on the nature and the size of a training set of key phrases. Most results aresuperior to 70.30% with a training set of 9,000 Vietnamese key phrases with of 2,000 sentences which wasselected from the corpus of Vietnamese Lexicography Center (www.vietlex.com.vn). © 2009 IEEE. Author Keywords: Key phrase; Natural language processing; Part-of-speech; Support vector machines;Vietnamese key phrase extraction; Word segmentation Index Keywords: Key phrase; Natural language processing; Part-of-speech; Vietnamese key phraseextraction; Word segmentation; Computational linguistics; Computer science; Digital libraries; Informationservices; Natural language processing systems; Research; Support vector machines; Vectors; Featureextraction Year: 2009 Source title: 2009 IEEE-RIVF International Conference on Computing and Communication Technologies:Research, Innovation and Vision for the Future, RIVF 2009 Art. No.: 5174613 Link: Scorpus Link Correspondence Address: Nguyen, C. Q.; Ho Chi Minh University of Industry, HCMC, 12 Nguyen Van BaoSt, Go Vap Dist, Viet Nam; email: chaunq@cse.hcmut.edu.vn Conference name: 2009 IEEE-RIVF International Conference on Computing and CommunicationTechnologies: Research, Innovation and Vision for the Future, RIVF 2009 Conference date: 13 July 2009 through 17 July 2009 Conference location: Danang City Conference code: 78379 ISBN: 9.78142E+12 DOI: 10.1109/RIVF.2009.5174613