EXTRACTING INFORMATION FROM CITESEER'S TEXTUAL DATA

This article deals with CiteSeer, a free online digital library and search engine of mainly computer science research papers. First, it discusses CiteSeer’s features and structure and then it presents what useful information on publications and author collaborations can be extracted from its textual data. We show the basic properties of both the publication citation and author citation graph. Moreover, several parameters based on the structure of the collaboration graph of authors are discussed and their main statistical properties are shown.