Two-staged extraction of attribute-value from description of exhibits in net auction system.

In order to achieve faceted search in net auction system, several researchers have dealt with the automated extraction of attributes and their values from descriptions of exhibits. In this paper, we propose a two-staged method to improve the performance of the extraction. The proposed method is based on the following two assumptions. 1) Identifying whether or not each sentence includes the target information is easier than extracting the target information from raw plain text. 2) Extracting the target information from the sentences selected in the first stage is easier than extracting the target information from the entire raw plain text. In the first stage, the method selects each sentence in a description that is judged to have attributes and/or values. In this stage, each sentence is represented a bag-of-words-styled feature vector, and is labeled as selected or not by a classifier derived by SVM. In the second stage, the extraction of attributes and values are performed on the cleaned text that does not contain parts of description irrelevant to exhibits, like descriptions for the postage, other exhibits, and so on. In the second stage, we adopt a sequential labeling method similar to named entity recognizers. The experimental result shows that the proposed method improves both the precision and the recall in the attribute-value extraction than only using second-stage extraction method. This fact supports our assumptions.