Cipher challenge

Cipher challenge

Postby doranchak » Mon Oct 05, 2015 10:29 pm

My experiments to produce Z340-like ciphers are starting to bear fruit.

Here's an example cipher - I challenge anyone to try to break it. It is a standard homophonic substitution cipher, read and enciphered in the normal reading direction.

Code: Select all
9%(1UyKbjJ.#5BR+3
+28@TSp1l-^NBtHER
+B|JLY8OzFR(4>bl*
VLk+FU2)^RJ/c5.DO
zB(WH8MNR+|c+.cO6
|5FU+<+RJ|*b.cVOL
|5FBc)T(ZU+7XzR+k
>+lpyV)D|(#kcNz):
68Vp%CK-*<WqC2#pc
-Ff2B9+>;ZlCP^BU-
7tLRd|D5.p9O)*ZM6
Bctz:&yVOp%<K+>^C
FqNLPp*-WfzZ2d7;k
l<S^+/|dT9f4YK+WG
j4EyM+WAlH#+VB+L<
z|4&+OkNpB1V2Ff/)
z+Mp_*(;KSp2(TGO+
FBcMSEG3dWKc.4_G5
pDCE4GyTY+_BAdP2p
|+tFMPHYGK+F6pX^2


What makes these generated ciphers unique is that they share very many qualities with the actual Z340:

  • Very similar symbol distribution
  • Same number of repeated bigrams/trigrams
  • Appearance of pivots
  • Top 20 L=2 homophone cycles have the same "strength" as the top 20 from Z340
  • Same "prime phobia" feature of + and B
  • Simulated transcription errors (Up to 20 polyphones where the alternate symbol assignments come from set of visually similar symbols)
  • Same number of repeated ngrams when even (or odd) positions removed.
  • Same absence of repeated symbols on lines 1-3, 11-13
  • Matches Jarlve's non-repeat measurement of the Z340 (4462)
  • Same appearance of word-like symbol sequences ("zodiak", "her", "god")
  • Same phenomenon of a trigram repeating in the same column, and that intersects with another repeating trigram
  • Similar appearance of "box corner" patterns
  • Similar appearance of "fold marks" on line 10
  • Misspellings and missing words in the plaintext
  • There is a section of filler

Here's a visual of some of the patterns in the generated cipher that are similar to Z340:
Image

The hint for the plaintext is that it comes from a Gilbert and Sullivan play.

I fed the cipher into zkdecrypto and azdecrypt. They each were able to recover around 60% to 65% of the plaintext.

A 10-symbol section of the cipher is filler. And there are 20 intentional transcription errors. So, that contributes to 9% of the cipher text being messed up. On top of that there are several intentional misspellings. These factors are probably making it difficult to automatically recover more of the plaintext.

Another hint: Because of the poetic nature of the plain text, some phrases appear more often than normal. Perhaps that is also increasing the difficulty.
User avatar
doranchak
 
Posts: 1673
Joined: Thu Mar 28, 2013 5:26 am

Re: Cipher challenge

Postby traveller1st » Tue Oct 06, 2015 1:26 am

Got it.
Image
"I don’t know Chief, he’s very smart or very dumb."
User avatar
traveller1st
 
Posts: 2856
Joined: Wed Mar 27, 2013 8:08 pm
Location: Northern Ireland

Re: Cipher challenge

Postby Jarlve » Tue Oct 06, 2015 3:44 am

You managed to reproduce allot of patterns!

Do these patterns just emerge or are there certain things in your cipher generation process that increase the odds of it happening? I'm guessing the repeating (poetic) nature of the plaintext contributes to some of these patterns.

Some discrepancies that I find worthy of note.

Although it matches the non-repeat score, your cipher appears much more random than the 340 (based on other measurements).

2-symbol cycle score (percentual difference vs. randomized mean) 340: 178%, yours: 125%.
3-symbol cycle score (percentual difference vs. randomized mean) 340: 180%, yours: 124%.

Also individual distribution of symbols throughout your cipher is much more random.

The following image is a non-repeat graph comparison between the 340 and your cipher. For your cipher there is some extra bulk towards right side of the graphs, I believe this sub-peak may have happened because of hill climbing versus top 20 cycles. Could alter the non-repeats measurement to give less weight (logaritmic?) to longer unique strings.

Image
User avatar
Jarlve
 
Posts: 1453
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: Cipher challenge

Postby doranchak » Tue Oct 06, 2015 5:20 am

Nice job, trav and Jarvle, for quickly solving the cipher. Do you think you would have gotten it without the hints?

Here's another one. Should be easy to solve.

Code: Select all
RT;%cqtp/+2F4Wk*|
5FBcK9O#y)HER6SW8
<+JlFM(2/k45R.zZ)
d9|k2C52WT&+zc9|L
GG+tLf>W^zjBd>6<c
4N#OVDA+zSYCZ1NB.
F^pNY+OT+@OpA69+7
*L(MdVc.bUXRKl2#B
O+5RlcHt8B4|MpD5T
-NBJ+.2^+|ylCp2<-
O+UpMbdSY1#j<Ff)z
pZ>KEUFL)V(&zN+:c
B+k6yl3MO(Cb*|JRH
zLq_pl+4GlCX|3.*-
OM#>_c2H;D6BJ4^|T
5(ktKP+)GG%dR(^8B
-KWVL.;+FBcSPFDy6
2B/pKP|MZY1f+UcF+
<+8p:-*fB/Ez+73^|
5FpO<(V7+yOG*KzR_


Image

Jarlve wrote:Do these patterns just emerge or are there certain things in your cipher generation process that increase the odds of it happening?

My algorithm is divided into two stages. In the first stage, it identifies plain texts that fulfill some of the criteria needed for specific patterns to appear (trigrams in the right place, pivots, fold marks, box corners). This is also the stage where misspellings, omitted/duplicated words, and filler are randomly introduced. In the second stage, a genetic algorithm conducts a multiobjective search to optimize each of the remaining objectives. So the algorithm isn't intended to discover how often the curious patterns happen as a side effect of encipherment. Instead, it is more to answer the question: Under our construction hypothesis, is it possible for generated ciphers to resemble Z340 and its curious phenomena? And is it possible for us to effectively solve the generated ciphers? If so, this may provide stronger evidence against specific construction methods.

Thanks for your analysis of discrepancies. I'd like to identify as many of those as possible. Do you have a link to where you describe how you compute your cycle scores?

Since my algorithm only cares about the "top 20" cycles of 2 symbols, that might explain why there's a big difference from the Z340. It may also provide more evidence that the apparent cycling in Z340 is truly information about the encipherment process "leaking" into our observations.

I wanted to compute a more thorough cycle comparison but it slows down the search considerably due to all the combinations of symbols involved. Perhaps I can remedy this simply by expanding from top 20 to to 50 or similar.

Jarlve wrote:Also individual distribution of symbols throughout your cipher is much more random.


Is that conclusion based on your non-repeat graph or something else? I wonder if the graph will look more similar if I include more than 20 cycles in the comparison. Or if I also include 3-symbol cycles. I will do that if it doesn't slow the multiobjective search down to a crawl. :)
User avatar
doranchak
 
Posts: 1673
Joined: Thu Mar 28, 2013 5:26 am

Re: Cipher challenge

Postby Jarlve » Tue Oct 06, 2015 6:14 am

doranchak wrote:Is that conclusion based on your non-repeat graph or something else?

I have to go but can share this measurement for now.

Code: Select all
A=summed_positions_per_symbol/symbol_frequency
B=total_chars_in_cipher/2
C+=abs(A-B)
repeat above for all symbols
score=C/B*340

In the 340 the first "+" symbol appears on position 20 and these positions are summed to get A. I also have the same measurement but then with 2D (grid) logic. It can for instance be used to determine the correct order of a cyclic h.s. cipher in parts (such as the 408 and jroberson) because the correct order typically scores lower. A higher score is more random. So h.s. ciphers will score lower, it's another measurement that I made with homophonic substitution in mind so it's more of the same really.

340: 1599
408: 1146
yours 1: 2644
yours 2: 2087
User avatar
Jarlve
 
Posts: 1453
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: Cipher challenge

Postby doranchak » Tue Oct 06, 2015 7:32 am

Thanks for that measurement description. It is an ideal addition to my set of objectives since it is fast to compute.

I'm trying to gain an intuitive sense of what the calculation represents; let me know if it is right:

A=summed_positions_per_symbol/symbol_frequency <== this appears to represent a "clustering" measurement. For instance, write the cipher in a straight line, and remove everything that isn't a + symbol. This measurement would represent where you'd hold the line to balance all 24 +'s.

B=total_chars_in_cipher/2 <== I assume this is a midpoint of the line described above.

C+=abs(A-B) <== At each step, this measures how far a given symbol is from the expected balance point if the symbol was uniformly distributed.

repeat above for all symbols
score=C/B*340 <== A ratio of the sum of differences to the midpoint of the cipher, normalized by the cipher length.
User avatar
doranchak
 
Posts: 1673
Joined: Thu Mar 28, 2013 5:26 am

Re: Cipher challenge

Postby daikon » Tue Oct 06, 2015 1:15 pm

I'm getting a stable solve for both ciphers as well using my own solver, but with a pretty low overall score, which means the recovered plaintext isn't represented well in the N-grams corpus. Probably because of the introduced transcription errors and misspellings, and seemingly "arcane" subject of the plaintext didn't help either. :) But I'm pretty sure I can clean it up manually to get closer to the intended plaintext. Now if only you can figure out how to mimic the near absence of bigram repeats in the second half, and for symbols in odd positions, we'd be all set. :)
User avatar
daikon
 
Posts: 178
Joined: Thu Jul 02, 2015 7:04 pm

Re: Cipher challenge

Postby Jarlve » Tue Oct 06, 2015 4:18 pm

doranchak wrote:Under our construction hypothesis, is it possible for generated ciphers to resemble Z340 and its curious phenomena? And is it possible for us to effectively solve the generated ciphers? If so, this may provide stronger evidence against specific construction methods.

I really like this idea, it must be fun and exciting but also quite a bit of work.

doranchak wrote:Do you have a link to where you describe how you compute your cycle scores?

The idea is based on my interpretation of smokie's cycle measurement. Measure every n-symbol cycle in the cipher in some way. I feel it's better for it to be not too exponential in nature. For instance "ABABABABAB" should not have twice the score of "ABABABAB". Depending on how you wish to measure the cycles some normalization matching symbols with different counts may be needed. Also could consider expected cycle length. My algorithm repeats this for 1000 randomizations of the cipher string and then calculates the percentual difference between the mean of the randomizations and the actual cipher.

doranchak wrote:I wanted to compute a more thorough cycle comparison but it slows down the search considerably due to all the combinations of symbols involved. Perhaps I can remedy this simply by expanding from top 20 to to 50 or similar.

Yes it's typically very slow, I'll think about it.

doranchak wrote:Thanks for that measurement description. It is an ideal addition to my set of objectives since it is fast to compute.

You understand correctly. My current implementation of the idea is certainly not perfect and some improvements could be made.
User avatar
Jarlve
 
Posts: 1453
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: Cipher challenge

Postby traveller1st » Tue Oct 06, 2015 5:18 pm

doranchak wrote:Nice job, trav and Jarvle, for quickly solving the cipher. Do you think you would have gotten it without the hints?


To answer that specific question ... not sure. It would have certainly taken longer. It's kinda tricky to be sure because I basically took what I assumed was the longest section of deciphered plaintext and did a search based off that. Then narrowed it down to what I felt was the correct section and then continued from there working back through the ciphertext inputing the plaintext. So I was sorta cheating, sorta not, 100% lazy lol. Not sure without a reference if I could have made sense of it so readily.
Image
"I don’t know Chief, he’s very smart or very dumb."
User avatar
traveller1st
 
Posts: 2856
Joined: Wed Mar 27, 2013 8:08 pm
Location: Northern Ireland

Re: Cipher challenge

Postby doranchak » Tue Oct 06, 2015 5:26 pm

Jarlve wrote:I really like this idea, it must be fun and exciting but also quite a bit of work.


It is very tedious. And I haven't even moved on to other construction hypotheses yet (a nightmarish endless pit of possibilities). I'm still only focused on tuning the optimizations for normal homophonic substitution but I wanted to try to rule out Dan Olson's hypothesis (splice together lines 1-3, 11-13), but the high multiplicity of those lines might be against me.

Jarlve wrote:The idea is based on my interpretation of smokie's cycle measurement. Measure every n-symbol cycle in the cipher in some way. I feel it's better for it to be not too exponential in nature. For instance "ABABABABAB" should not have twice the score of "ABABABAB". Depending on how you wish to measure the cycles some normalization matching symbols with different counts may be needed. Also could consider expected cycle length. My algorithm repeats this for 1000 randomizations of the cipher string and then calculates the percentual difference between the mean of the randomizations and the actual cipher.


Makes sense. I like the idea of comparing them to randomizations as a reference point.

Jarlve wrote:
doranchak wrote:I wanted to compute a more thorough cycle comparison but it slows down the search considerably due to all the combinations of symbols involved. Perhaps I can remedy this simply by expanding from top 20 to to 50 or similar.

Yes it's typically very slow, I'll think about it.


For my multiobjective search I implemented a slightly faster cycle detection. The basic idea is to keep track of a list of perfect cycles encountered while doing one pass of the cipher. As it scans the cipher, it tracks a list of "active" and "inactive" cycles. At first, the list of active cycles is large. But as breaks in the cycles are found, they are removed and placed in a list of inactive cycles. So, the list of active cycles soon dwindles down to a reasonable size as you progress further into the cipher text. I think this is faster than the naive method of calculating every possible combination of n symbols and isolating the resulting sequences. My approach is similar to the King and Bahler method described in their REMOVE_HOMOPHONES algorithm, but does not yet take advantage of their other optimizations. Also, the weakness in this approach is that the cycles have to start out perfectly -- there's no accommodation for perfect cycles that appear after a first section of "junk".
User avatar
doranchak
 
Posts: 1673
Joined: Thu Mar 28, 2013 5:26 am

Next

Return to Zodiac Cipher Mailings & Discussion

Who is online

Users browsing this forum: No registered users and 0 guests

cron