Research and Application of Document Similarity Measuring Algorithm

In respect to the limitation of document similarity measuring based on VSM,this paper put forward an algorithm based on public substring of strings.Storing student's graduation-design documents with XML and generating document object tree by DOM API in java,it calculates homologous text numbers by visiting vertexes with depth-first search algorithm and making comparison of them.Taking into consideration the similarity of document structures,the new algorithm can judge documents similarity effectively.