Excel is powerful..over the last days I created a small (=partial)
CLEARTEXT GENERATOR for the 340 cipher. This is how it works:
First, we know about the following
two sequences in the Z340:
seq1.JPG
seq1.JPG
We now use a
dictionary (this time a large one). We then assume that in sequence no. 2 there is at least one word of length >4 letters. Although this could also not be the case, this possibility is still a realistic assumption (as long as Z had not only used short or extremely long words in that specific sequence).
However, we still have no idea, where such a word of length >4 would actually start.
Now the
pattern matching method helps us out. The word 'boob', for example, can be placed into a cipher structure like 'CDDC' or, for homophone ciphers even 'CDJC' or 'CDJK', but not into a cipher structure like 'CDCJ' (try it, if you want). 'C' is 'C' and the letter 'b' is definitely not an 'o'.
Thus, not all words from the dictionary can be placed onto all positions of the sequence. This because some of the homophones ocurring in the sequence are repeating (eg. the double '++').
With the support of Cryptool, it was possible to place
every single word (of different length, but at least >4) onto
any position in sequence 2. All of the previous under consideration of the cipher pattern as well as the possibility of homophones being used (polyalphabet, e.g. 'CDJK'). Starting from homophone #1 of the sequence, checking the whole dictionary for the pattern, then homophone #2 etc.
This results in a list of approx. 150,000 words on specific positions in sequence no. 2.
After this pattern-matching (cracking) process has been done - which was, by the way, described in an early NSA paper by Friedman, 'Military Cryptanalysis', 1952 - there are still additional steps to be done:
After
defining a letter for the '+' symbol, e.g. the letter 'L' or 'S', we can still
add a list of 5-grams for sequence no. 1.
This because sequence no. 1, most likely, contains two frequent, overlapping trigrams. 'Frequent', because these overlapping trigrams actually occur twice in the cipher
although homophones are used (shuffling the alphabet all the time..). Non-frequent trigrams rather would not occur twice or at least not in combination with a second also repeating trigram. This can be argued but one may expect two repating trigrams showing up in any homophone cipher anyway, as some frequent trigrams (e.g. 'AND' or 'THE') would indeed statistically occur around 6-8 times in a 340 letter text.
Now if we
combine all of the above , we end up with a 20MB Excel file and millions of rather complicated formulas (e.g. '
first letter equal to '+' and second letter equal to the third letter of the 5-gram but length of the word =8' etc.). To cover all those combinations, millions of calculations are performed 'in the background' to find all combinations of '+' symbol, 5-gram, sequence pattern and of course words.
Astonishing were the results:
For example, if you select the letter 'S' in combination with the 5-gram 'THECO', there are only 79 values found (from the whole dictionary) that actually match the pattern structure of sequence no.2 (at any spot).Based on a dictionary of 50,000 words, this represents a ratio of 79:50,000, which is equal to an exclusion of
99.84% of all words of the dictionary (with that specific setting). All of this is done based on NSA Friedman's millitary cryptanalisys method.
Following the previous example, it was possible to find out that if you use 'S' and 'THECO', you will be able to partially complete the sequence no. 2 with e.g. the word 'SEDUCTRESS' but 78 other words such as 'DOCTOR' or 'PASSPORT'. You can try this in Oranchak's webtoy, too. It is not possible, however, to enter the word 'CROSSROAD'. But to enter this word is possible, if you chose the 5-gram 'CTION' instead of 'THECO').
That only a very small amount of words is found under such pre-setting is quite satisfying. It opens at least a
chance to crack that cipher.
Conclusion:
Considering the most likely 5-grams (e.g. 30) in sequence no. 1 results in approximately 2,400 potential cleartext phrases (per each letter chosen for '+'). This, of course, including at least one full word of length >4. It is possible, that one of the examples already found (like discussed above) was indeed once written by Zodiac.
In other words:
Enter one letter for the '+' symbol, add a frequent 5-gram for sequence no. 1 and with a good chance you get some potential cleartext (list of only 50-100 words). Based on such a large dictionary (+50,000), other cleartext is unlikely (assuming a standard homophone substitution).
Or described as sort of a promise:
If you give me the correct letter for the '+' symbol as well as the correct 5-gram for sequence no. 1, you will receive 23-26% cleartext of the 340 cipher in return (will be explained later). This, including a list of approximately 50-100 alternatives, which may match the pattern of sequence no. 2, too.
For Excel, imo, this is quite surprising. We may estimate:
Approximately a max. of 10 letters for the '+' symbol (I believe ins 'S' or 'L' but ok - at least it'd be a frequent one). An estimated 200 different 5-grams consisting of repeating, thus frequent trigrams (overlapping each other). Approximately 70-100 results per configuration. All this leads us to approx.
10 x 200 x 70 = 140,000
potential settings. Those may further be used for additional computation (e.g. cross-checking with another sequence of the cipher, continuing with pattern matching, trial & error etc.). As an optimist, however, I'd say only 2 letters for the '+' symbol are in question and a maximum of 60 different 5-grams may be needed; leading us to:
2 x 60 x 70 = 8,400
potential settings, with one of it leading to Z's cleartext. Optimism, however, is just a lack of information..
With
the correct setting, all depending on the position and length of the word, approximately 80 to 90 homophones of the Z340 are then 'solved'.
This is equal to a total of 23%-26% of the cipher! As long as someone doesn't have an IQ of +170 or being the world champion in scrabble, additional computation is most likely needed to then convert the complete cipher into cleartext. Needless to say that, if you have choosen the wrong settings from the beginning (e.g. wrong 5-gram), Z would continue to be 'crackproof' until you have found the right setting.
For those interested, here is how many of the formulas used acutally look like (German languag, this specific one actually found the word 'seductress'..no need to understand the formula as it is here shown out of context)
- Code: Select all
=WENN(UND(LÄNGE($Y19510)>11;$A19510=$I19510;$A19510=$AB$1;$B19510=RECHTS(LINKS(AY$1;3);1);$E19510=RECHTS(LINKS(AY$1;4);1);$F19510=LINKS(AY$1;1);$D19510=$L19510;$G19510=$K19510;$I19510=$J19510);$Y19510;WENN(UND(LÄNGE($Y19510)=11;$A19510=$I19510;$A19510=$AB$1;$B19510=RECHTS(LINKS(AY$1;3);1);$E19510=RECHTS(LINKS(AY$1;4);1);$F19510=LINKS(AY$1;1);$G19510=$K19510;$I19510=$J19510);$Y19510;WENN(UND(LÄNGE($Y19510)=10;$A19510=$I19510;$A19510=$AB$1;$B19510=RECHTS(LINKS(AY$1;3);1);$E19510=RECHTS(LINKS(AY$1;4);1);$F19510=LINKS(AY$1;1);$I19510=$J19510);$Y19510;WENN(UND(LÄNGE($Y19510)=9;$A19510=$I19510;$A19510=$AB$1;$B19510=RECHTS(LINKS(AY$1;3);1);$E19510=RECHTS(LINKS(AY$1;4);1);$F19510=LINKS(AY$1;1));$Y19510;WENN(UND(LÄNGE($Y19510)>5;LÄNGE($Y19510)<9;$A19510=$AB$1;$B19510=RECHTS(LINKS(AY$1;3);1);$E19510=RECHTS(LINKS(AY$1;4);1);$F19510=LINKS(AY$1;1));$Y19510;WENN(UND(LÄNGE($Y19510)=5;$A19510=$AB$1;$B19510=RECHTS(LINKS(AY$1;3);1);$E19510=RECHTS(LINKS(AY$1;4);1));$Y19510;""))))))
QT
You do not have the required permissions to view the files attached to this post.