eBizSearch: an OAI-compliant digital library for ebusiness

Niche search engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance and when nontraditional search features are required. Niche search engines can take advantage of their domain of concentration to achieve higher relevance and offer enhanced features. We discuss a new digital library niche search engine, eBizSearch, dedicated to e-business and e-business documents. The ground technology for eBizSearch is CiteSeer, a special-purpose automatic indexing document digital library and search engine developed at NEC Research Institute. We present the integration of CiteSeer in the framework of eBizSearch and the process necessary to tune the whole system towards the specific area of e-business. We show how using machine learning algorithms we generate metadata to make eBizSearch Open Archives compliant.

[1]  Nigel J. Robinson A Comparison of Utilities for Converting from PostScript or Portable Document Format to Text , 2001 .

[2]  Amy Friedlander,et al.  D-Lib Magazine: Publishing as the Honest Broker , 1998 .

[3]  Wang Jun Open Archives Initiative Protocol for Metadata Harvesting , 2005 .

[4]  C. Lee Giles,et al.  Distributed error correction , 1999, DL '99.

[5]  Edward A. Fox,et al.  Networked Digital Library of Theses and Dissertations: Bridging the Gaps for Global Access - Part 2: Services and Research , 2001, D Lib Mag..

[6]  Carl Lagoze,et al.  Core services in the architecture of the national science digital library (NSDL) , 2002, JCDL '02.

[7]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[8]  C. Lee Giles,et al.  Inquirus, the NECI Meta Search Engine , 1998, Comput. Networks.

[9]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[10]  Gregory R. Crane,et al.  Building a digital library: the Perseus project as a case study in the humanities , 1996, DL '96.

[11]  Edward A. Fox,et al.  Preservation and transition of NCSTRL using an OAI-based architecture , 2002, JCDL '02.

[12]  Herbert Van de Sompel,et al.  The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.

[13]  C. Lee Giles,et al.  Autonomous citation matching , 1999, AGENTS '99.

[14]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[17]  Ian H. Witten,et al.  Extracting text from PostScript , 1998 .

[18]  Luis Gravano,et al.  Metadata for digital libraries: architecture and design rationale , 1997, DL '97.

[19]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[20]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[21]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[22]  Edward A. Fox,et al.  Networked Digital Library of Theses and Dissertations: Bridging the Gaps for Global Access - Part 1: Mission and Progress , 2001, D Lib Mag..