KeyWorld: Extracting Keywords from a Document as a Small World

The small world topology is known widespread in biological, social and man-made systems. This paper shows that the small world structure also exists in documents, such as papers. A document is represented by a network; the nodes represent terms, and the edges represent the co-occurrence of terms. This network is shown to have the characteristics of being a small world, i.e., nodes are highly clustered yet the path length between them is small. Based on the topology, we develop an indexing system called KeyWorld, which extracts important terms by measuring their contribution to the graph being small world.