Construire un moteur d'indexation

We present here an indexing engine which is covered by a technology transfer agreement between the University and the private sector. This engine is currently included in various applications used by international organizations. The document collections which are indexed are large and multilingual. The particular elements of the technical specifications are the starting pint of our analysis; then we look at the design and technology choices made to meet the performance and volume constraints. The optimal use of memory, calculations and storage resources is discussed. The serialization and parallelization of processes are analyzed.

[1]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[2]  Nassib Nassar,et al.  Amberfish at the TREC 2004 Terabyte Track , 2004, TREC.

[3]  Charles L. A. Clarke,et al.  Overview of the TREC 2004 Terabyte Track , 2004, TREC.

[4]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[5]  C. J. Fall,et al.  Computer-Assisted Categorization of Patent Documents in the International Patent Classification , 2003 .

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[8]  W. Bruce Croft,et al.  Fast Incremental Indexing for Full-Text Information Retrieval , 1994, VLDB.

[9]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[10]  Lan Ji Huang,et al.  A Survey On Web Information Retrieval Technologies , 2000 .

[11]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[12]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[13]  Gilles Falquet,et al.  Ontology-Based Multilingual Information Retrieval , 2005, CLEF.

[14]  J. Eliot B. Moss,et al.  Design of the Mneme persistent object store , 1990, TOIS.

[15]  András Kornai,et al.  How many words are there? , 2002, Glottometrics.

[16]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .