Deploying Lucene on the Grid

investigate if and how open source retrieval engines can be deployed in a grid environment. When comparing grids to conventional distributed IR, the lack of a-priori knowledge about available nodes is one of the most significant dier- ences. On top of that, it is also unknown when a particular node has time and resources available and starts a submit- ted job. Therefore, conventional methods such as RMI are not directly usable and we propose a dierent approach, us- ing middleware designed specifically for grids. We describe GridLucene, an extension of the open source engine Lucene with grid-specific classes, based on this middleware. We report on an initial comparison between GridLucene and Lucene, and find a minor penalty (in terms of execution time) for grid-based indexing and a more serious penalty for grid-based retrieval. The used middleware can gather a set of physical resources to form a single logical resource with some abstract prop- erties. The user-definable properties can be used during indexing and retrieval to let GridLucene know which files it needs to access. By using this kind of semantic information, grid nodes can "discover" which indices exist on the grid and which particular documents need to be indexed. GridLucene is available for downloading under the same li- cense as Lucene.

[1]  Ian T. Foster,et al.  On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing , 2003, IPTPS.

[2]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[3]  Ian T. Foster,et al.  The Community Authorization Service: Status and Future , 2003, ArXiv.

[4]  M.I.T. Press,et al.  The International Journal of Supercomputer Applications and High Performance Computing— , 1994 .

[5]  A. Trotman Introduction to the INEX 2005 Workshop on Element Retrieval Methodology , 2005 .

[6]  Borja Sotomayor,et al.  Globus toolkit 4 : programming Java services , 2006 .

[7]  Xie Kanglin Lucene Search Engine , 2007 .

[8]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[9]  Steven Garcia,et al.  RMIT University at TREC 2005: Terabyte and Robust Track , 2005, TREC.

[10]  Lyle J. Winton A Simple Virtual Organisation Model and Practical Implementation , 2005, ACSW.

[11]  Andrew Trotman,et al.  Report on the INEX 2005 workshop on element retrieval methodology , 2005, SIGF.

[12]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[13]  Marco Danelutto,et al.  Structured Implementation of Component-Based Grid Programming Environments , 2004, Future Generation Grids.

[14]  Jason Maassen,et al.  Ibis: an efficient Java-based grid programming environment , 2002, JGI '02.

[15]  Marcel Kunze,et al.  Grid Computing in Europe: From Research to Deployment , 2005, ACSW.

[16]  Henri E. Bal,et al.  Developing Java Grid Applications with Ibis , 2005, Euro-Par.

[17]  Cécile Germain,et al.  Grid result checking , 2005, CF '05.

[18]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[19]  Gilles Fedak,et al.  Global Computing Systems , 2001, LSSC.

[20]  Charles L. A. Clarke,et al.  The TREC 2005 Terabyte Track , 2005, TREC.

[21]  Ian Foster,et al.  The Globus toolkit , 1998 .

[22]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[23]  Ian T. Foster,et al.  A security architecture for computational grids , 1998, CCS '98.

[24]  David Clark,et al.  The Morgan Kaufmann Series in Networking , 2008 .

[25]  David P. Anderson,et al.  SETI@home-massively distributed computing for SETI , 2001, Comput. Sci. Eng..

[26]  Norman W. Paton,et al.  The design and implementation of Grid database services in OGSA‐DAI , 2005, Concurr. Pract. Exp..

[27]  Ian Foster Internet Computing and the Emerging Grid , 2000 .

[28]  Gregor von Laszewski,et al.  The Java CoG kit experiment manager , 2006 .

[29]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[30]  Charles L. A. Clarke,et al.  The TREC terabyte retrieval track , 2005, SIGF.

[31]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[32]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[33]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.