This paper presents a method for finding a specification page on the web for a given object (e.g."Titanic ö)and its class label (e.g."film ö). A specification page for an object is a web page which gives concise attribute-value information about the object (e.g."director ö-"James Cameron öfor "Titanic ö). A simple unsupervised method using layout and symbolic decoration cues was applied to a large number of web pages to acquire the class attributes. We used these acquired attributes to select a representative specification page for a given object from the web pages retrieved by a normal search engine. Experimental results revealed that our method greatly outperformed the normal search engine in terms of specification retrieval.
[1]
Massimo Poesio,et al.
Attribute-Based and Value-Based Clustering: An Evaluation
,
2004,
EMNLP.
[2]
Kentaro Torisawa,et al.
Automatic Discovery of Attribute Words from Web Documents
,
2005,
IJCNLP.
[3]
Hsin-Hsi Chen,et al.
Mining Tables from Large Scale HTML Texts
,
2000,
COLING.
[4]
Hiroshi Nakagawa,et al.
Specification Retrieval - How to Find Attribute-Value Information on the Web
,
2004,
IJCNLP.
[5]
Jun'ichi Tsujii,et al.
A method to integrate tables of the World Wide Web
,
2001
.