Decoding Anagrammed Texts Written in an Unknown Language and Script

Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. We then present an approach to decoding anagrammed substitution ciphers, in which the letters within words have been arbitrarily transposed. It obtains the average decryption word accuracy of 93% on a set of 50 ciphertexts in 5 languages. Finally, we report the results on the Voynich manuscript, an unsolved fifteenth century cipher, which suggest Hebrew as the language of the document.

[1]  George Nagy,et al.  Decoding Substitution Ciphers by Means of Word Matching with Application to OCR , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[3]  Grzegorz Kondrak,et al.  Solving Substitution Ciphers with Combined Language Models , 2014, COLING.

[4]  Gabriel Landini,et al.  EVIDENCE OF LINGUISTIC STRUCTURE IN THE VOYNICH MANUSCRIPT USING SPECTRAL ANALYSIS , 2001, Cryptologia.

[5]  M. E. D'Imperio,et al.  The Voynich Manuscript - An Elegant Enigma , 1981 .

[6]  John Matthews Manly,et al.  Roger Bacon and the Voynich MS , 1931, Speculum.

[7]  Gordon Rugg,et al.  AN ELEGANT HOAX? A POSSIBLE SOLUTION TO THE VOYNICH MANUSCRIPT , 2004, Cryptologia.

[8]  L C Strong,et al.  ANTHONY ASKHAM, THE AUTHOR OF THE VOYNICH MANUSCRIPT. , 1945, Science.

[9]  Gerry Kennedy,et al.  The Voynich Manuscript: The Mysterious Code That Has Defied Interpretation for Centuries , 2006 .

[10]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[11]  Kevin Knight,et al.  Learning Phoneme Mappings for Transliteration without Parallel Data , 2009, HLT-NAACL.

[12]  Klaus Schmeh A Milestone in Voynich Manuscript Research: Voynich 100 Conference in Monte Porzio Catone, Italy , 2013, Cryptologia.

[13]  Kevin Knight,et al.  Attacking Decipherment Problems Optimally with Low-Order N-gram Models , 2008, EMNLP.

[14]  M. Montemurro,et al.  Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis , 2013, PloS one.

[15]  ROBERT STEELE The Cipher of Roger Bacon , 1928, Nature.

[16]  Kevin Knight,et al.  Unsupervised Analysis for Decipherment Problems , 2006, ACL.

[17]  Malte Nuhn,et al.  Cipher Type Detection , 2014, EMNLP.

[18]  Kevin Knight,et al.  The Copiale Cipher , 2011, BUCC@ACL.

[19]  Grzegorz Jaśkiewicz Analysis of Letter Frequency Distribution in the Voynich Manuscript , 2011 .

[20]  Kevin Knight,et al.  What We Know About The Voynich Manuscript , 2011, LaTeCH@ACL.

[21]  A. Robinson Lost Languages: The Enigma of the World's Undeciphered Scripts , 2002 .

[22]  Michaela Regneri,et al.  SeedLing: Building and Using a Seed corpus for the Human Language Project , 2014 .

[23]  Andreas Schinner The Voynich Manuscript: Evidence of the Hoax Hypothesis , 2007, Cryptologia.

[24]  V.F. Kleist,et al.  The code book: the science of secrecy from ancient egypt to quantum cryptography [Book Review] , 2002, IEEE Annals of the History of Computing.

[25]  Antony J. Williams,et al.  Beautiful Data: The Stories Behind Elegant Data Solutions , 2009 .