Bilingual PRESRI - Integration of Multiple Research Paper Databases

Collecting all the papers in a research field is a first step towards an exhaustive survey. A number of research paper databases are available for searching papers. However, searchers are compelled to repeat the same search operation for each database if there are multiple databases for a research field. To improve such inefficient searching, we have developed PRESRI, which can construct an exhaustive database by integrating multiple research paper databases. First, we collect Postscript and PDF files on the WWW, and construct a database ('WEB-DB') by extracting bibliographic information from the files. Second, we construct an exhaustive database by integrating WEB-DB with other databases. As a key technique for constructing an exhaustive database, we propose a method for extracting bibliographic information from Postscript and PDF files based on a SVM. To investigate the effectiveness of our method, we conducted an examination. We found that our method is useful for both Japanese and English. In this paper, we also focus on the presentation of search results, which is an important factor in constructing an efficient survey environment. We have developed a system that makes it possible to understand the relationships between papers intuitively based on citation information.

[1]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[2]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[3]  Hidetsugu Nanba,et al.  Towards multi-paper summarization reference information , 1999, IJCAI 1999.

[4]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[5]  Manabu Okumura,et al.  Towards Multi-paper Summarization Using Reference Information , 1999, IJCAI.

[6]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .

[7]  Marc Moens,et al.  What's Yours and What's Mine: Determining Intellectual Attribution in Scientific Text , 2000, EMNLP.

[8]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[9]  Task-Based Evaluation of Summary Quality: Describing Relationships between Scientific Papers , 2001 .

[10]  Sunita Sarawagi,et al.  Automatic segmentation of text into structured records , 2001, SIGMOD '01.

[11]  Donna Bergmark,et al.  Scraping the ACM Digital Library , 2001, SIGF.

[12]  Gobinda G. Chowdhury,et al.  Template mining for the extraction of citation from digital documents , 2001 .

[13]  Suguru Saito,et al.  Zero-Click : a system to support Web browsing , 2002 .

[14]  Atsuhiro Takasu,et al.  Bibliographic attribute extraction from erroneous references based on a statistical model , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..