Applications Of Informetrics To Information Retrieval Research

Introduction Information science is an interdisciplinary field that encompasses the study of the production, organization, storage, retrieval, dissemination and use of information. Research may focus on the information user, the systems that provide access to information, or the interface between the two. Over the past fifty years a number of sub-fields have emerged within information science. Two primary areas of study within the discipline are information retrieval (IR) and informetrics. Each specialty has developed from different traditions, but have common areas of interest. In this paper, the author provides a nontechnical overview of information retrieval and informetrics for the non-specialist, with a focus on the applications of the intersection of these two areas for IR system design and evaluation. What is information retrieval? Information retrieval is a selective process by which desired information is extracted from a store of information called a database (Meadow, 1992). Traditionally, IR systems have been used to locate text-based information, either the full-text of documents or document surrogates that summarize the contents of documents located outside of the database (e.g. bibliographic records). In recent years, information retrieval has broadened to include multimedia formats such as images, sound and video. IR system usage has also broadened during this time. Previously, information professionals were the primary users of IR systems, searching systems available through vendors such as DIALOG and EBSCO Information Services. The wider availability of online public access catalogues in libraries, CD-ROM database systems, and, most recently, web search engines, has made IR systems much more accessible to end users. The process of interactive information retrieval involves a dialogue between the searcher and the IR system. The searcher initially submits a query to the IR system. Queries consist of one or more search terms and operators that define the parameters for records to be retrieved. The query terms are compared to an index of terms within the database using the operations (e.g. and, or, not) specified in the query. A list of records matching the query criteria is presented to the searcher for perusal. Based on the searcher's inspection of the records retrieved, the query may be reformulated. The process is then repeated. On the surface, IR systems may resemble commonly used database management systems (DBMS). Although it is possible to develop an IR system using certain DBMS software, physical and philosophical differences distinguish these two types of systems. For example, the concept of relevance is central to information retrieval but does not play a role in DBMS interactions. Due to the ambiguities of language, not all items retrieved may be relevant to the searcher's information needs, despite having matched the query parameters. This is the challenge of IR: ensuring the timely retrieval of relevant items while not retrieving those items that are non-relevant to the searcher's information need. Numerous conceptual models have been developed for IR systems. Many of today's IR systems incorporate a Boolean approach where retrieval is based on an exact or partial match to a query. Many bibliographic database systems accessible within libraries or through database vendors such as DIALOG use this method. Also popular are probabilistic systems that take into account likelihood of relevance based on frequency of occurrence of search terms within documents, allowing retrieved items to be presented in rank order based on calculated relevance. Most World Wide Web search engines and other full-text IR systems rely on this approach. Still, other systems rely on a vector space model, where potential relevance is determined by proximity of documents to queries, represented as vectors in a multi-dimensional space (Salton & McGill, 1983). Information retrieval remains a key research area within information science. …

[1]  SpinkAmanda,et al.  Real life, real users, and real needs , 2000 .

[2]  Dietmar Wolfram Inter-Record Linkage Structure in a Hypertext Bibliographic Retrieval System , 1996, J. Am. Soc. Inf. Sci..

[3]  Dietmar Wolfram,et al.  Applying Informetric Characteristics of Databases to IR System File Design, Part I: Informetric Models , 1992, Inf. Process. Manag..

[4]  Giles,et al.  Searching the world wide Web , 1998, Science.

[5]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[6]  S. Bradford "Sources of information on specific subjects" by S.C. Bradford , 1985 .

[7]  B. C. Brookes Comments on the scope of bibliometrics , 1988 .

[8]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[9]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[10]  Michael John Nelson Probabilistic Models For The Simulation Of Bibliographic Retrieval Systems , 1982 .

[11]  Jose-Marie Griffiths,et al.  INDEX TERM INPUT TO IR SYSTEMS , 1975 .

[12]  Dietmar Wolfram,et al.  Applying Informetric Characteristics of Databases to IR System File Design, Part II: Simulation Comparisons , 1992, Inf. Process. Manag..

[13]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[14]  Paul Nicholls,et al.  The maximal value of a zipf size variable: Sampling properties and relationship to other parameters , 1987, Inf. Process. Manag..

[15]  A. Bendell,et al.  Rank Order Distributions and Secondary Key Indexing , 1985, Computer/law journal.

[16]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[17]  Charles T. Meadow,et al.  Text information retrieval systems , 1992 .

[18]  Michael J. Nelson Stochastic Models for the Distribution of Index Terms , 1989, J. Documentation.

[19]  Jane Fedorowicz,et al.  The Theoretical Foundation of Zipf's Law and Its Application to the Bibliographic Database Environment , 2007, J. Am. Soc. Inf. Sci..

[20]  J. Tague,et al.  What's the use of bibliometrics ? , 1988 .

[21]  Liwen Qiu Frequency Distributions of Hypertext Path Patterns: A Pragmatic Approach , 1994, Inf. Process. Manag..