Distributed Tracing of Intruders

Abstract : Unwelcome intrusions into computer systems are being perpetrated by strangers, and the number of such incidents is rising steadily. One of the things that facilitates this malfeasance is that computer networks provide the ability for a user to log into multiple computer systems in sequence, changing identity with each step. This makes it very difficult to trace actions on a network of computers all the way back to their actual origins. We refer to this as the tracing problem. This thesis attempts to address this problem by the development of a technology called thumbprinting. Thumbprinting involves forming a signature of the data in a network connection. This signature is a small quantity which does not allow complete reconstruction of the data, but does allow comparison with signatures of other connections to determine with reasonable confidence whether the data were the same or not. This is a potential basis for a tracing system. The specific technology developed to perform this task is local thumbprinting. This involves forming linear combinations of the frequencies with which different characters occur in the network data sampled. The optimal linear combinations are chosen using a statistical methodology called principal component analysis. The difficulties which this process must overcome are outlined, and an algorithm for comparing the thumbprints which adaptively handles these difficulties is presented. A number of experiments with a trial implementation of this method are described. The method is shown to work successfully when given at least a minute and a half of reasonably active network connection. This requires presently about 20 bytes per minute per connection of storage for the thumbprints. In addition, the existing (very limited) literature on the tracing problem is reviewed.