A Detection Algorithm for the Illegal Coping and Distributing of Digital Goods

The purpose of the paper is to attack the problem of illegal coping and distribution of digital texts. Based on the registration-based mechanism, one reconstruction method and the corresponding overlapping measuring algorithms are presented in the paper. In the proposed method, at first, the protected digital texts are represented both in the aspects of physical structure and semantic content with different levels and different granularities; after that they are registered in the database; and then the web crawler surfs on the Internet automatically and periodically and returns the suspected digital texts; finally, by using the proposed overlapping measuring algorithms, the returned digital texts are compared with those saved in the database at semantic content level and physical structure level respectively. If the semantic similarity between two digital texts is smaller than the given threshold, the comparing process stops. So the efficiency is greatly improved. Because of the flexibility of the representing method and the overlapping measuring algorithms, the proposed method can discovery not only the whole illegal coping and distributing behaviors, such as equivalence replication, superset replication, and shift whole replication; but also the partial illegal coping and distributing behaviors, such as subset replication and shift partial replication. In the same time, it also possesses the good scaling ability to massive data. In order to examine the effectiveness of the algorithms, experiments have been done from five aspects. Results show that average precision of the algorithms is more than the sentence-based exhausting comparing method 3. 4%, the average running time of the algorithms is less than the sentence-based exhausting comparing method 40. 2%.