Corpus-Based Learning of Compound Noun Indexing

In this paper, we present a corpus-based learning method that can index diverse types of compound nouns using rules automatically extracted from a large tagged corpus. We develop an efficient way of extracting the compound noun indexing rules automatically and perform extensive experiments to evaluate our indexing rules. The automatic learning method shows about the same performance compared with the manual linguistic approach but is more portable and requires no human efforts. We also evaluate the seven different filtering methods based on both the effectiveness and the efficiency, and present a new method to solve the problems of compound noun over-generation and data sparseness in statistical compound noun processing.

[1]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[2]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[3]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[4]  Joel L. Fagan,et al.  The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval , 1989, JASIS.

[5]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[6]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[7]  Gary Geunbae Lee,et al.  Generalized unknown morpheme guessing for hybrid POS tagging of Korean , 1998, VLC@COLING/ACL.

[8]  Jeong Soo Ahn,et al.  Using n-grams for Korean text retrieval , 1996, SIGIR '96.

[9]  Tomek Strzalkowski,et al.  Natural Language Information Retrieval: TREC-8 Report , 1994, TREC.

[10]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[11]  Joel L. Fagan The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval , 1989 .

[12]  Keh-Yih Su,et al.  A Corpus-Based Approach to Automatic Compound Extraction , 1994, ACL.

[13]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[14]  Tomek Strzalkowski,et al.  Natural Language Information Retrieval: TIPSTER-2 Final Report , 1996, TIPSTER.

[15]  Kyo Kageura,et al.  Phrase processing methods for Japanese text retrieval , 1998, SIGF.