Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words

We present a framework for mining synonymous transliterations from a set of Web pages collected via a search engine. An integrated statistical measure is proposed to form search keywords for a search engine in order to retrieve relevant Web snippets. We employ a scheme of comparing the similarity between two transliterations to aid in identifying synonymous transliterations. Experimental results show that the average number of harvesting synonymous transliterations is about 5.04 for an input transliteration. The retrieval results could be beneficial for constructing ontology, especially, in the domain of foreign person names.

[1]  Keita Tsuji Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs , 2002, Int. J. Comput. Process. Orient. Lang..

[2]  Jyh-Shing Roger Jang,et al.  Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources , 2006, TALIP.

[3]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[4]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[5]  Hsin-Hsi Chen,et al.  反向異文字音譯相似度評量方法與跨語言資訊檢索 (Similarity Measure in Backward Transliteration between Different Character Sets and Its Application to CLIR) [In Chinese] , 2000, ROCLING/IJCLCLP.

[6]  Wei-Ying Ma,et al.  Multitype features coselection for Web document clustering , 2006 .

[7]  Hsin-Hsi Chen,et al.  Backward Machine Transliteration by Learning Phonetic Similarity , 2002, CoNLL.

[8]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Chung-Chian Hsu,et al.  Measuring similarity between transliterations against noise data , 2007, TALIP.

[10]  Hsin-Hsi Chen,et al.  Translating–transliterating named entities for multilingual information access , 2006 .

[11]  Grzegorz Kondrak,et al.  Phonetic Alignment and Similarity , 2003, Comput. Humanit..

[12]  Pu-Jen Cheng,et al.  Translating unknown queries with web corpora for cross-language information retrieval , 2004, SIGIR '04.

[13]  Kevin Knight,et al.  Translating Names and Technical Terms in Arabic Text , 1998, SEMITIC@COLING.

[14]  Harold L. Somers Similarity Metrics for Aligning Children's Articulation Data , 1998, COLING-ACL.

[15]  Haizhou Li,et al.  A phonetic similarity model for automatic extraction of transliteration pairs , 2007, TALIP.