Privacy Preserving C4.5 Algorithm over Vertically Distributed Datasets

It is a primary task in the privacy-preserving data mining in the distributed environment how to protect  privacy and at the same time acquire accurate data relation. This paper shows how two parties built a decision tree collaboratively without revealing privacy when datasets is vertically distributed, including a PPC4.5 algorithm for privacy preserving via C4.5 over vertically distributed datasets and an algorithm of the best split attribute and the information gain ratio of the node. Further, the secure scalar product protocol and the x¿(x) protocol are used in collaborative computing, which can protect privacy effectively.