Automatic searching of tables in digital libraries

Tables are ubiquitous. Unfortunately, no search engine supportstable search. In this paper, we propose a novel table specificsearching engine, TableSeer, to facilitate the table extracting, indexing, searching, and sharing. In addition, wepropose an extensive set of medium-independent metadata to precisely present tables. Given a query, TableSeer ranks the returned results using an innovative ranking algorithm - TableRank with a tailored vector space model and a novel term weightingscheme. Experimental results show that TableSeer outperforms existing search engines on table search. In addition, incorporating multiple weighting factors can significantly improve the ranking results.

[1]  Yalin Wang,et al.  A machine learning based approach for table detection on the web , 2002, WWW '02.

[2]  Kun Bai,et al.  Automatic extraction of table metadata from digital documents , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[3]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4]  Kun Bai,et al.  TableSeer: automatic table metadata extraction and searching in digital libraries , 2007, JCDL '07.

[5]  W. Bruce Croft,et al.  TINTIN: a system for retrieval in text tables , 1997, DL '97.