Information retrieval on the web

In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited by the sources are consistent and point to exponential growth in the past and in the coming decade. Hence it is not surprising that about 85% of Internet users surveyed claim using search engines and search services to find specific information. The same surveys show, however, that users are not satisfied with the performance of the current generation of search engines; the slow retrieval speed, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. We discuss the development of new techniques targeted to resolve some of the problems associated with Web-based information retrieval and speculate on future trends.

[1]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[2]  George G. Robertson,et al.  The WebBook and the Web Forager: an information workspace for the World-Wide Web , 1996, CHI.

[3]  Carl Lagoze,et al.  The Warwick Framework: A Container Architecture for Diverse Sets of Metadata , 1996, D Lib Mag..

[4]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[5]  E. Frisse Mark,et al.  Searching for information in a hypertext medical handbook , 1988 .

[6]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[7]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[8]  Catherine Plaisant,et al.  Dynamaps: dynamic queries on a health statistics atlas , 1994, CHI '94.

[9]  E. Tufte,et al.  The visual display of quantitative information , 1984, The SAGE Encyclopedia of Research Design.

[10]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[11]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[12]  Mauricio Antonio Hernandez-Sherrington A generalization of band joins and the merge/purge problem , 1996 .

[13]  Charles Elkan,et al.  An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.

[14]  Monika Henzinger,et al.  Hyperlink Analysis for the Web , 2001, IEEE Internet Comput..

[15]  C. Lee Giles,et al.  Searching the Web: general and scientific information access , 1999, First IEEE/POPOV Workshop on Internet Technologies and Services. Proceedings (Cat. No.99EX391).

[16]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[18]  C. Lee Giles,et al.  Context and Page Analysis for Improved Web Search , 1998, IEEE Internet Comput..

[19]  Alex Pentland,et al.  Introduction to the Special Section on Digital Libraries: Representation and Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  William S. Cooper Is interindexer consistency a hobgoblin , 1969 .

[21]  Koichi Takeda Pattern-Based Machine Translation , 1996, COLING.

[22]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[23]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[24]  Katashi Nagao,et al.  Semantic transcoding: making the world wide web more understandable and usable with external annotations , 2000, COLING 2000.

[25]  W. Scott Spangler,et al.  Clustering hypertext with applications to web searching , 2000, HYPERTEXT '00.

[26]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[27]  Yoav Shoham,et al.  Learning Information Retrieval Agents: Experiments with Automated Web Browsing , 1995 .

[28]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[29]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[30]  Hans-Peter Kriegel,et al.  A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.

[31]  Ben Shneiderman,et al.  Visual information seeking: tight coupling of dynamic query filters with starfield displays , 1994, CHI '94.

[32]  T. Sakairi A site map for visualizing both a Web site's structure and keywords , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[33]  Gerald Salton,et al.  Automatic text processing , 1988 .

[34]  ShneierMichael,et al.  Exploiting the JPEG Compression Scheme for Image Retrieval , 1996 .

[35]  David Clark,et al.  Shopbots Become Agents for Business Change , 2000, Computer.

[36]  Ben Shneiderman,et al.  Navigating in hyperspace: designing a structure-based toolbox , 1994, CACM.

[37]  Jock D. Mackinlay,et al.  Cone Trees: animated 3D visualizations of hierarchical information , 1991, CHI.

[38]  Padmini Srinivasan,et al.  Cross-language information retrieval with the UMLS metathesaurus , 1998, SIGIR '98.

[39]  B R Schatz,et al.  Information Retrieval in Digital Libraries: Bringing Search to the Net , 1997, Science.

[40]  Barbara Meitin Preschel Indexer Consistency in Perception of Concepts and In Choice of Terminology; Final Report. , 1972 .

[41]  Rick Kazman,et al.  Audio enhanced 3D interfaces for visually impaired users , 1996, CHI '96.

[42]  Mark E. Frisse,et al.  Searching for information in a hypertext medical handbook , 1987, Commun. ACM.

[43]  Chieko Asakawa,et al.  Enabling the Visually Disabled to Use the WWW in a GUI Environment , 1996 .

[44]  Dik Lun Lee,et al.  Document Ranking and the Vector-Space Model , 1997, IEEE Softw..

[45]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[46]  Kôiti Hasida,et al.  Automatic Text Summarization Based on the Global Document Annotation , 1998, COLING-ACL.

[47]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[48]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[49]  C. W. Cleverdon,et al.  Evaluation Tests of Information Retrieval Systems , 1970 .

[50]  Shivakumar Vaithyanathan,et al.  Exploiting clustering and phrases for context-based information retrieval , 1997, SIGIR '97.

[51]  Nancy Green,et al.  Natural language in computer human-interaction: a CHI 99 special interest group , 2000, SGCH.

[52]  Chieko Asakawa,et al.  An interactive method for accessing tables in HTML , 1998, Assets '98.

[53]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[54]  C. Lee Giles,et al.  Text and Image Metasearch on the Web , 1999, PDPTA.

[55]  Stuart I. Feldman Web search services in 1998: Trends and challenges , 1998 .

[56]  Ramana Rao,et al.  A focus+context technique based on hyperbolic geometry for visualizing large hierarchies , 1995, CHI '95.

[57]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[58]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[59]  Don Gilbert,et al.  Intelligent Agents: The Right Information at the Right Time , 1998 .

[60]  Marti A. Hearst Interfaces for Searching the Web , 1997, Scientific American.

[61]  Herbert Coblans,et al.  Progress in Documentation. , 1972 .

[62]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[63]  Prabhakar Raghavan,et al.  Information retrieval algorithms: a survey , 1997, SODA '97.

[64]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[65]  Douglas W. Oard Cross-Language Text Retrieval Research in the USA , 1997 .

[66]  Hanan Samet,et al.  MARCO: MAp Retrieval by COntent , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Adele E. Howe,et al.  SAVVYSEARCH: A Metasearch Engine That Learns Which Search Engines to Query , 1997, AI Mag..

[68]  Jakob Nielsen,et al.  User interface directions for the Web , 1999, CACM.

[69]  Gerard Salton,et al.  Automatic text analysis , 1970, J. Am. Soc. Inf. Sci..

[70]  Jeffrey O. Kephart,et al.  Emergent behavior in information economies , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[71]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[72]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[73]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[74]  B. Shneiderman,et al.  The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploration system , 1992, SIGIR '92.

[75]  Rick Kazman,et al.  WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.

[76]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[77]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[78]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[79]  Giles,et al.  Searching the world wide Web , 1998, Science.

[80]  Michael W. Berry,et al.  Large-Scale Information Retrieval with Latent Semantic Indexing , 1997, Inf. Sci..

[81]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[82]  Luis Gravano,et al.  Querying multiple document collections across the internet , 1998 .

[83]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[84]  Daniel S. Weld,et al.  Intelligent Agents on the Internet: Fact, Fiction, and Forecast , 1995, IEEE Expert.

[85]  Udi Manber,et al.  WebGlimpse: combining browsing and searching , 1997 .

[86]  James P. Ignizio,et al.  Foreword , 1996, Comput. Oper. Res..

[87]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[88]  Douglas W. Oard Speech-Based Information Retrieval for Digital Libraries , 1998 .

[89]  Marti A. Hearst,et al.  Visualizing information retrieval results: a demonstration of the TileBar interface , 1996, CHI Conference Companion.

[90]  J Allan,et al.  Readings in information retrieval. , 1998 .

[91]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[92]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[93]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[94]  Varun Grover,et al.  Special Issue: Knowledge Management , 2001, J. Manag. Inf. Syst..

[95]  Junji Maeda,et al.  Representation and retrieval of video scene by using object actions and their spatio-temporal relationships , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[96]  Matthew Chalmers,et al.  Bead: explorations in information visualization , 1992, SIGIR '92.

[97]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[98]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[99]  Michelle Q. Wang Baldonado,et al.  An interactive, structure-mediated approach to exploring information in a heterogeneous, distributed environment , 1997 .

[100]  Hideo Watanabe,et al.  A Pattern-Based Machine Translation System Extended by Example-Based Processing , 1998, COLING-ACL.

[101]  Gerard Salton,et al.  A Comparison Between Manual and Automatic Indexing Methods , 1968 .

[102]  Christos Faloutsos,et al.  A survey of information retrieval and filtering methods , 1995 .

[103]  Brian D. Davison,et al.  Human Performance on Clustering Web Pages: A Preliminary Study , 1998, KDD.

[104]  Hanan Samet,et al.  Pictorial Query Specification for Browsing Through Spatially Referenced Image Databases , 1998, J. Vis. Lang. Comput..

[105]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[106]  Tomek Strzalkowski Natural Language Information Retrieval , 1995, Inf. Process. Manag..

[107]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[108]  Shih-Fu Chang,et al.  Video object model and segmentation for content-based video indexing , 1997, Proceedings of 1997 IEEE International Symposium on Circuits and Systems. Circuits and Systems in the Information Age ISCAS '97.

[109]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[110]  Yoav Shoham,et al.  An Adaptive Agent for Automated Web Browsing , 1997 .

[111]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[112]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[113]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[114]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[115]  T. V. Raman,et al.  Emacspeak—a speech interface , 1996, CHI.

[116]  Vijay V. Raghavan,et al.  Information Retrieval on the World Wide Web , 1997, IEEE Internet Comput..

[117]  Jian-Kang Wu,et al.  Identifying faces using multiple retrievals , 1994, IEEE MultiMedia.

[118]  Ricardo A. Baeza-Yates,et al.  Introduction to Data Structures and Algorithms Related to Information Retrieval , 1992, Information Retrieval: Data Structures & Algorithms.

[119]  Shih-Fu Chang,et al.  Compressed-domain techniques for image/video indexing and manipulation , 1995, Proceedings., International Conference on Image Processing.

[120]  Michael Stonebraker,et al.  Database research: achievements and opportunities into the 1st century , 1996, SGMD.

[121]  K. Takeda,et al.  Information Outlining and Site Outlining , 1997 .

[122]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[123]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[124]  Ramana Rao,et al.  Rich interaction in the digital library , 1995, CACM.

[125]  Jun Murai,et al.  Special Issue on Internet Technology III , 2001 .

[126]  Oren Etzioni,et al.  Adaptive Web Sites: an AI Challenge , 1997, IJCAI.

[127]  Maurice B. Line,et al.  PROGRESS IN DOCUMENTATION: ‘obsolescence’ and changes in the use of literature with time , 1974 .

[128]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[129]  Gary Marchionini,et al.  Information Seeking in Electronic Environments , 1995 .

[130]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[131]  PETER A. GLOOR,et al.  Cybermap - Visually Navigating the Web , 1998, J. Vis. Lang. Comput..

[132]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[133]  Dagobert Soergel,et al.  Organizing information - principles of data base and retrieval systems , 1985 .

[134]  Oren Etzioni,et al.  Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[135]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[136]  Tadao Ichikawa,et al.  Structured Graph Format: XML Metadata for Describing Web Site Structure , 1998, Comput. Networks.

[137]  Peter Scheuermann,et al.  A parallel algorithm for record clustering , 1990, TODS.

[138]  Earl Rennison,et al.  Galaxy of news: an approach to visualizing and understanding expansive news landscapes , 1994, UIST '94.

[139]  James D. Hollan,et al.  Pad++: a zooming graphical interface for exploring alternate interface physics , 1994, UIST '94.

[140]  Edie M. Rasmussen,et al.  Clustering Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[141]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[142]  Israel Ben-Shaul,et al.  WebCutter: A System for Dynamic and Tailorable Site Mapping , 1997, Comput. Networks.

[143]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[144]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[145]  Hector Garcia-Molina,et al.  Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.

[146]  Shih-Fu Chang,et al.  Querying by color regions using VisualSEEk content-based visual query system , 1997 .

[147]  Lee W. McKnight,et al.  Pricing Internet Services: Approaches and Challenges , 2000, Computer.

[148]  Marc Najork,et al.  Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.

[149]  Sher ry Folsom-Meek,et al.  Human Performance , 2020, Nature.

[150]  Hector Garcia-Molina,et al.  Finding near-replicas of documents on the Web , 1999 .

[151]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[152]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[153]  James E. Hanson,et al.  Price-war dynamics in a free-market economy of software agents , 1998 .

[154]  George Barr McCutcheon,et al.  Brewster's Millions , 1902 .

[155]  Maristella Agosti,et al.  Information Retrieval and Hypertext , 1996, Information Retrieval and Hypertext.

[156]  Boon-Lock Yeo,et al.  Video query: Research directions , 1998, IBM J. Res. Dev..

[157]  Tamara Munzner,et al.  Visualizing the structure of the World Wide Web in 3D hyperbolic space , 1995, VRML '95.

[158]  Ben Shneiderman,et al.  Dynamic queries for visual information seeking , 1994, IEEE Software.

[159]  L. C. Vroomen,et al.  Cheops: a compact explorer for complex hierarchies , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[160]  Kathleen Webster Matthews,et al.  Beyond surfing: Tools and techniques for searching the web , 1996 .

[161]  Wolfgang Wahlster,et al.  Readings in Intelligent User Interfaces , 1998 .

[162]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[163]  Edward Rolf Tufte,et al.  The visual display of quantitative information , 1985 .

[164]  Edward R. Tufte,et al.  The Visual Display of Quantitative Information , 1986 .

[165]  Robert M. Losee Text retrieval and filtering: analytic models of performance , 1998 .

[166]  Ramesh C. Jain,et al.  A Visual Information Management System for the Interactive Retrieval of Faces , 1993, IEEE Trans. Knowl. Data Eng..

[167]  Jeremy A. Hylton,et al.  Identifying and Merging Related Bibliographic Records , 1996 .

[168]  Tim Berners-Lee,et al.  The World-Wide Web , 1994, CACM.

[169]  Oren Etzioni,et al.  Dynamic Reference Sifting: A Case Study in the Homepage Domain , 1997, Comput. Networks.

[170]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[171]  Michael David Williams,et al.  What Makes RABBIT Run? , 1984, Int. J. Man Mach. Stud..

[172]  Yosuke Takashima,et al.  A Melody Retrieval Method with Hummed Melody , 1994 .

[173]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[174]  Anil K. Jain,et al.  A Real-Time Matching System for Large Fingerprint Databases , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[175]  Vladimir Slamecka,et al.  INDEXER CONSISTENCY UNDER MINIMAL CONDITIONS , 1962 .

[176]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[177]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[178]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[179]  John R. Smith,et al.  Searching for Images and Videos on the World-Wide Web , 1999 .

[180]  Terry Winograd,et al.  SenseMaker: an information-exploration interface supporting the contextual evolution of a user's interests , 1997, CHI.

[181]  J. T. Robinson,et al.  Progressive search and retrieval in large image archives , 1998, IBM J. Res. Dev..