Concept-Instance Relation Extraction from Simple Noun Sequences Using a Full-Text Search Engine

This paper describes a simple method for acquiring conceptinstance relations from simple noun sequences that frequently appear in Japanese Web documents. In Japanese, many noun sequences can consist of two NPs that have a concept-instance relation. This phenomenon is similar to apposition in English but differs in that many of these noun sequences do not provide any explicit clues, such as the proper noun capitalization or commas used in English apposition, that indicate the boundary between the concept name and the instance name. We developed a method to detect such implicit boundaries between concept names and instance names, and to filter out erroneous concept-instance relations by using a search engine.