AllInOneNews: development and evaluation of a large-scale news metasearch engine

AllInOneNews is the largest news metasearch engine in the world, connecting to over 1,000 news sites over 150 countries. Implementing a large-scale metasearch engine like AllInOneNews needs to overcome unique challenges not faced by building small metasearch engines such as developing highly scalable search engine selection techniques. In this paper, we discuss these unique challenges and our solutions to these challenges. We also discuss some novel features of AllInOneNews such as highly automated solution and semantic query match. This paper also reports the results of a comparative evaluation of three commercial news search systems, one search engine - Google News and two metasearch engines - Mamma News and AllInOneNews. Several measures such as effectiveness, diversity and time-sensitivity are used to perform the comparison. Another contribution of this paper is that we introduce a novel scheme to compare multiple news search systems in a combined measure that takes both relevance and time-sensitivity of retrieved information into consideration.

[1]  Guijun Wang,et al.  ProFusion*: Intelligent Fusion from Multiple, Distributed Search Engines , 1996, J. Univers. Comput. Sci..

[2]  Peter Bailey,et al.  Measuring Search Engine Quality , 2001, Information Retrieval.

[3]  Luis Gravano,et al.  Merging Ranks from Heterogeneous Internet Sources , 1997, VLDB.

[4]  King-Lup Liu,et al.  Determining Text Databases to Search in the Internet , 1998, VLDB.

[5]  Clement T. Yu,et al.  Towards a highly-scalable and effective metasearch engine , 2001, WWW '01.

[6]  Juliana Freire,et al.  Searching for Hidden-Web Databases , 2005, WebDB.

[7]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[8]  David Hawking,et al.  Which Search Engine is Best at Finding Online Services? , 2001, WWW Posters.

[9]  Clement T. Yu,et al.  A highly scalable and effective method for metasearch , 2001, TOIS.

[10]  Yizhong Fan,et al.  Adaptive Agents for Information Gathering from Multiple, Distributed Information Sources , 1999 .

[11]  Udi Manber,et al.  The Search Broker , 1997, USENIX Symposium on Internet Technologies and Systems.

[12]  David Hawking,et al.  Result merging strategies for a current news metasearcher , 2003, Inf. Process. Manag..

[13]  Michael K. Bergman White Paper: The Deep Web: Surfacing Hidden Value , 2001 .

[14]  Adele E. Howe,et al.  Experiences with selecting search engines using metasearch , 1997, TOIS.

[15]  Weiyi Meng,et al.  Web Search Technology , 2004 .

[16]  King-Lup Liu,et al.  Evaluation of Result Merging Strategies for Metasearch Engines , 2005, WISE.

[17]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[18]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[19]  David Hawking,et al.  Automated Discovery of Search Interfaces on the Web , 2003, ADC.

[20]  Vijay V. Raghavan,et al.  Fully automatic wrapper generation for search engines , 2005, WWW '05.

[21]  King-Lup Liu,et al.  A Statistical Method for Estimating the Usefulness of Text Databases , 2002, IEEE Trans. Knowl. Data Eng..

[22]  King-Lup Liu,et al.  Efficient and effective metasearch for a large number of text databases , 1999, CIKM '99.

[23]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[24]  King-Lup Liu,et al.  A Methodology to Retrieve Text Documents from Multiple Databases , 2002, IEEE Trans. Knowl. Data Eng..

[25]  B. Huberman,et al.  The Deep Web : Surfacing Hidden Value , 2000 .

[26]  Vijay V. Raghavan,et al.  Towards automatic incorporation of search engines into a large-scale metasearch engine , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[27]  King-Lup Liu,et al.  Automatic Extraction of Publication Time from News Search Results , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[28]  Christoph Baumgarten,et al.  A probabilistic solution to the selection and fusion problem in distributed information retrieval , 1999, SIGIR '99.