Extracting ontologies from World Wide Web via HTML tables

Minoru Yoshida, Kentaro Torisawa and Jun’ichi Tsujii 1 Department of Computer Science, Graduate school of Information Science and Technology, 2 School of Information Science, Japan Advanced Institute of Science and Technology 3 Information and Human Behavior, PRESTO, Japan Science and Technology Corporation CREST, JST(Japan Science and Technology Corporation) Postal address: Department of Computer Science, Graduate school of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan Telephone: +81 3 5803 1697 Facsimile: +81 3 5802 8872 {mino, tsujii}@is.s.u-tokyo.ac.jp, torisawa@jaist.ac.jp

[1]  Shona Douglas,et al.  Layout and language: preliminary investigations in recognizing the structure of tables , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Edward A. Green,et al.  Model-Based Analysis of Printed Tables , 1995, GREC.

[4]  Edward A. Green,et al.  Model-based analysis of printed tables , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[5]  Hsin-Hsi Chen,et al.  Mining Tables from Large Scale HTML Texts , 2000, COLING.

[6]  Hwee Tou Ng,et al.  Learning to Recognize Tables in Free Text , 1999, ACL.