A pipeline for reconstructing cross-shredded English document

Document shreds reconstruction is of great significance in the fields of file confidentiality, anti-disclosure, and investigative science. In this paper, a complete and practical pipeline is designed to reconstruct cross-shredded English documents. The pipeline firstly classifies the shreds into several clusters based on an improved K-means algorithm to reduce clustering imbalance. Especially, a preprocessing is needed before extracting feature vector for shredded English document because of the unaligned characters. Owing to its successful performance in reconstructing the strip-shredded documents, Hungarian algorithm is applied into the permutation for the cross-shredded shreds in the same row. Eventually the location of the connective horizontal paper slips are arranged by considering the complementary relationship of edge vectors between two neighboring shreds. Reconstruction experiment results indicate that the designed pipeline can acquire high precision and efficiency.

[1]  Matthias Prandtstetter,et al.  A Memetic Algorithm for Reconstructing Cross-Cut Shredded Text Documents , 2010, Hybrid Metaheuristics.

[2]  Matthias Prandtstetter,et al.  Meta-heuristics for reconstructing cross cut shredded text documents , 2009, GECCO.

[3]  M.G. Strintzis,et al.  Shredded document reconstruction using MPEG-7 standard descriptors , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[4]  Patrick De Smet Reconstruction of ripped-up documents using fragment stack analysis procedures. , 2008 .

[5]  L. El-Afifi,et al.  'Hands-free interface'- a fast and accurate tracking procedure for real time human computer interaction , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[6]  Günther R. Raidl,et al.  Enhancing a Genetic Algorithm with a Solution Archive to Reconstruct Cross Cut Shredded Text Documents , 2013, EUROCAST.

[7]  Azzam Sleit,et al.  An alternative clustering approach for reconstructing cross cut shredded text documents , 2011, Telecommunication Systems.

[8]  Andrew C. Gallagher,et al.  Semi-automatic assembly of real cross-cut shredded documents , 2012, 2012 19th IEEE International Conference on Image Processing.

[9]  Edson Justino,et al.  Reconstructing shredded documents through feature matching. , 2006, Forensic science international.

[10]  Matthias Prandtstetter,et al.  Combining Forces to Reconstruct Strip Shredded Text Documents , 2008, Hybrid Metaheuristics.

[11]  Giovanni Ramponi,et al.  Features for the reconstruction of shredded notebook paper , 2005, IEEE International Conference on Image Processing 2005.

[12]  Cinthia O. A. Freitas,et al.  Reconstructing strip-shredded documents using color as feature matching , 2009, SAC '09.