Computer Science Literature and the World Wide Web

We analyze the computer science literature on the web and com pare it to the literature indexed in the Science Citation Index (SCI). The web contains articles from throughout the research timeline, from technical reports and conference papers to journal article s and book chapters, whereas SCI focuses on journal articles. Analyzing the citation patterns of the ar ticles, we find that journal articles and books dominate the most cited items from papers on the web and paper s in SCI. However, we find that conference papers and technical reports play a very important r ole in computer science research, especially regarding access to the very latest research. Analysis of ci tations over time suggests that conference and technical report citations tend to be replaced with journal and book citations when they become available. The web is changing the way that researchers access scientifi literature. For computer science in particular, research papers are often made available on the homepa ges of authors and institutions. In this paper, we analyze publication and citation patterns for computer sci en e papers on the web, and compare our results with similar analysis of computer science literature in the Science Citation Index [2, 3]. The Science Citation IndexR (SCI), created by Dr. Eugene Garfield and the Institute for Sc ientific Information (ISI) (www.isinet. om), is an index of the significant scientific journals. The SCI i s created with manual assistance from human indexers, and it is expens ive to index all of the literature. ISI has chosen to restrict indexing primarily to the most significant journ als. The SCI began print publication in 1961 and covers about 3,500 source journals. We restricted analy sis to primarily computer science literature by selecting the relevant subdivisions (Hardware & Architect ure; Information Systems; Software, Graphics & Programming; and others). We accessed SCI via Dialog. In ord er to work around limits in the amount of data that Dialog would permit us to sort, we partitioned the d ata into groups of 2,000 computer science related source articles, and retrieved a systematic sample of 15 groups for further processing. Our analysis covers 30,000 source articles published between 1973 and 19 99, containing about 400,000 citations [4]. ResearchIndex ( resear hindex.org), also known as CiteSeer, provides similar functionality t o the SCI, in addition to other features, for literature on the web . The ResearchIndex software may be used on any database of scientific literature, however the service at resear hindex.org currently indexes literature freely available on the publicly indexable web [6]. Researc hIndex uses Autonomous Citation Indexing (ACI) [7, 5] to create a citation index without any manual ass istance, and allows researchers to perform literature search and evaluation on a database of over 300,0 0 computer science articles. It also provides a unique opportunity to analyze the computer science litera ture, much of which has not previously been available in traditional indexing services. At the time of t his analysis, ResearchIndex consisted of about 200,000 articles containing about 3 million citations. Computer science articles on the web Figure 1 shows the distribution of articles in SCI and Resear chIndex. For ResearchIndex, the distribution is approximated from manual coding of 500 randomly selected ar ticles, of which only 36% were cited within