Unigram distance curiosity

Re: Unigram distance curiosity

Postby smokie treats » Sat Oct 14, 2017 6:43 am

Jarlve wrote:What is the cause of a reasonably large group of symbols not appearing in the middle 3rd of the cipher?
on't

I don't know but I am going to work on this idea further.
User avatar
smokie treats
 
Posts: 1620
Joined: Thu Feb 19, 2015 1:34 pm
Location: Lawrence, Kansas

Re: Unigram distance curiosity

Postby Jarlve » Sat Oct 14, 2017 6:58 am

doranchak wrote:That might be an effective method, but I also wonder if it is a side effect of the procedure that weakened the cycles and/or bigrams.

In that case I think it would be more likely to have been caused by a symbol thing rather than transposition.

If Zodiac used sequential homophonic substitution for a reason (rather than picking homophones at random) then the only thing I can think of is that it was to hide unigram repeats over short/medium distances. In the 340 there are very few unigram repeats over short distances, especially when taking in consideration its higher ioc per cipher length than the 408. This is why we see 9 rows which have no repeats, it is not easily connected to transposition after/during encoding and typical encoding randomization.

So I came the conclusion that if Zodiac tried to hide unigram repeats he could have done it another way, by not trying to repeat symbols in a given window of his view, without having to keep track of the cycles. The statistics of the 340 seem to agree: low unigram repeats over short/medium distance, high unique sequence peak of 26 at length 17, no apparent homophonic sequences such as in the 408. A silver bullet.

Dan Olson wrote:The higher randomness may be due in part or whole to greater care by the writer to not repeat characters on these lines.
User avatar
Jarlve
 
Posts: 2544
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: Unigram distance curiosity

Postby smokie treats » Sat Oct 14, 2017 9:11 am

Basically, I am Zodiac and encoding along. I say, "Have I used symbol X in the last row?". If the answer is yes, and I want to encode another plaintext that X maps to, I will use symbol Y instead, which also maps to that plaintext.

New spreadsheet, I can work with any shaped message.

I slide LRTB from position to position, and RLBT from position to position at the same time. At each position, I iterate through all of the symbols, and ask, does the symbol occur below the leftmost position, above the rightmost position, but not in between. If so, then I count +1, and I also count the number of symbols that occur within those criteria.

The 340 as is. There is a jump on row 6. Basically, the top chart shows that there are 10 symbols that appear exclusively on rows 1-6 and 15-20. The bottom chart shows that the symbols occupy a total of 39 positions. Then both graphs level out on row 7, and start climbing again on row 8.

unigram distance phenomenon 3.png

I can set the threshold position to highlight, here at 102.

unigram distance phenomenon 4.png
You do not have the required permissions to view the files attached to this post.
User avatar
smokie treats
 
Posts: 1620
Joined: Thu Feb 19, 2015 1:34 pm
Location: Lawrence, Kansas

Re: Unigram distance curiosity

Postby smokie treats » Sat Oct 14, 2017 9:24 am

The 340, but rearranged so that the left column becomes the top row and the top row becomes the left column and the message is 20 x 17. Rotated 90 degrees and then mirrored.

At 5 rows, 100 positions almost exactly the same area as 6 rows in a 17 x 20, there are only four total symbols exclusive to rows 1-5 and 13-17. There are only sixteen positions occupied. That is less.

unigram distance phenomenon 5.png


unigram distance phenomenon 6.png


Basically, there are more than twice as many symbols and positions occupied exclusive to the top and bottom 6 rows as compared to the top and bottom 5 rows with the message turned sideways.
You do not have the required permissions to view the files attached to this post.
User avatar
smokie treats
 
Posts: 1620
Joined: Thu Feb 19, 2015 1:34 pm
Location: Lawrence, Kansas

Re: Unigram distance curiosity

Postby smokie treats » Sun Oct 15, 2017 6:21 pm

Jarlve,

Here is a message that you may solve if you want to. It is probably not like the 340 in a lot of respects. I haven't checked all of the stats. The only goal was to see if I could generate a message using two different keys and with a bigram repeat spike and a lot of symbols and positions occupied that are exclusive to rows 1-6 and 15-20.

1. Message from library
2. Transposed at period 20. Inscription 17 x 20 LRTB, reading TBLR, and transcription into a 17 x 20 LRTB
3. Alberti cipher with rows 1-6 with key #7, rows 7-14 with key #15, and rows 15-20 with key #7 again
unigram distance phenomenon 7.png

4. Homophonic encoding, with one high count polyphone

55 26 53 22 28 46 12 35 5 27 57 23 24 54 37 22 23
31 47 24 32 48 46 53 22 43 54 25 13 33 29 20 34 23
14 38 21 24 15 20 26 47 1 44 53 22 21 54 9 45 23
22 12 13 56 24 58 20 16 14 43 44 45 59 55 18 17 48
22 22 6 3 40 19 30 18 28 57 31 15 32 29 21 30 33
46 2 41 28 53 23 24 16 29 25 58 22 47 43 48 44 23
17 42 39 19 15 16 22 7 4 10 40 41 11 10 46 45 47
18 48 43 49 12 42 59 22 50 17 5 15 49 37 38 6 51
57 7 50 16 17 5 40 31 24 39 52 41 58 1 51 32 37
15 2 36 6 11 42 13 49 59 20 3 33 35 14 38 40 44
4 39 7 50 16 22 5 46 22 8 41 57 42 37 12 58 17
31 38 6 32 7 5 40 49 33 1 41 39 22 37 6 13 7
50 52 42 10 15 11 3 8 10 59 40 2 31 16 32 11 51
22 14 4 52 1 10 56 5 33 38 41 23 42 6 8 3 49
45 36 4 24 22 22 27 17 55 12 57 39 56 43 30 21 58
50 20 22 21 37 23 20 38 28 29 47 30 39 28 37 13 24
22 26 29 31 30 55 38 32 54 28 56 23 14 44 53 39 24
59 7 21 45 22 23 24 22 22 48 43 57 33 55 46 44 15
12 23 24 29 58 59 30 13 54 53 22 28 47 27 48 40 57
58 20 46 45 9 43 14 29 30 22 25 56 19 54 53 35 16

I still have work to do, and this is preliminary. It only took me 15 tries to generate. All perfect cycles. There should be a bunch of symbols exclusive to rows 1-6 and 15-20.
You do not have the required permissions to view the files attached to this post.
User avatar
smokie treats
 
Posts: 1620
Joined: Thu Feb 19, 2015 1:34 pm
Location: Lawrence, Kansas

Re: Unigram distance curiosity

Postby smokie treats » Mon Oct 16, 2017 12:13 am

Here is a better one.

Same cipher, but the message has more comparable cycles 25% randomization.

Rows 1-6 encoded with Alberti row 21, rows 7-14 with Alberti row 1, and rows 15-20 with Alberti row 21.

It wasn't difficult to make. I think that the spike at P20 is not as difficult to make as I thought it would be because there are bigrams that are different in the differently Alberti encoded parts of the message, but they get Alberti encoded the same.

unigram distance phenomenon 8.png

5 56 20 34 29 3 12 1 30 57 32 20 6 7 27 13 51
30 4 44 52 56 51 5 54 21 54 6 55 28 12 52 27 14
19 51 19 19 20 28 21 7 19 16 50 30 45 17 25 19 5
33 3 54 4 47 18 20 28 16 17 50 28 49 54 3 25 52
6 15 45 50 7 27 55 19 13 29 53 21 19 32 49 54 26
7 31 5 28 55 24 19 18 44 51 25 40 4 50 29 30 12

38 31 42 19 33 48 43 35 19 42 15 30 21 10 18 11 32
17 23 42 19 42 2 23 13 20 33 39 43 18 42 8 13 27
1 38 30 21 19 7 3 41 19 43 34 9 40 32 53 20 39
21 13 33 47 10 48 19 32 16 11 9 19 18 33 45 10 28
2 41 1 30 19 47 11 2 10 15 49 2 42 20 50 2 46
1 19 19 30 9 20 2 5 11 43 8 39 4 1 39 18 9
38 42 39 25 57 19 8 21 43 24 46 9 33 10 40 44 27
2 1 19 20 42 43 5 10 10 42 41 21 26 24 19 34 38

44 29 20 49 18 7 30 3 25 4 3 17 21 31 18 28 19
27 25 44 29 45 26 30 13 14 54 54 44 44 20 31 5 6
45 50 40 4 53 15 5 13 53 54 20 5 24 36 41 25 12
22 16 16 13 55 37 21 28 46 20 19 52 26 24 29 27 3
6 46 51 19 30 2 13 18 20 33 53 17 17 54 7 31 52
45 36 21 18 49 19 50 25 32 19 55 33 13 17 37 36 26


The symbol count distribution is very similar too. And there are 41 spaces occupied by symbols exclusive to rows 1-6 and 15 - 20.

unigram distance phenomenon 9.png
You do not have the required permissions to view the files attached to this post.
User avatar
smokie treats
 
Posts: 1620
Joined: Thu Feb 19, 2015 1:34 pm
Location: Lawrence, Kansas

Re: Unigram distance curiosity

Postby doranchak » Tue Oct 17, 2017 11:49 am

doranchak wrote:Computed this way, the row-swapped Z340 has about double the average sigma as the unmodified Z340. You can see this looking at individual cycles, where the overall relative probabilities are, on average, less than those of the Z340. Here is the raw output of cycles of both ciphers for comparison:

https://docs.google.com/spreadsheets/d/ ... sp=sharing

It shows all L2 cycles detected for both ciphers, side by side, in decreasing order by estimated probability. When you scroll down, you'll notice an emerging trend for cycles to become more improbable in the row-swapped Z340. Look at the "How much more improbable" column. Positive values indicate an increase in improbability. There are also 172 extra cycles in the row-swapped Z340 compared to the original.


I added L3 cycles to that spreadsheet.

https://docs.google.com/spreadsheets/d/ ... sp=sharing

Click on the L=3 tab at the bottom of the spreadsheet. The modified cipher has more improbable cycles than the original, and has about 2,4000 extra cycles that appear (by which I mean cycles that have a minimum run length of 2).

I think it's interesting that the average statistical significance of individual cycles has gone up for the entire pile of detected cycles. I still have no idea what it means. Maybe it's just easy to make the cycles behave this way with simple manipulations of the cipher text.
User avatar
doranchak
 
Posts: 2360
Joined: Thu Mar 28, 2013 5:26 am

Re: Unigram distance curiosity

Postby Jarlve » Wed Oct 18, 2017 11:40 am

Just a quick rundown, for my measurements, your modified cipher has increased 2, 3-symbol cycles and midpoint shift score while perfect 2, 3, 4-symbol cycles, unigram distance, sliding unigrams, appearance and unique sequences decreased. If some row order needs to be restored a reasonably simple hill climber should do the trick since 20! is not a very large search space for a hill climber with some speed. It will return false positives but at least may hint at what kind of improvements are possible. It would also test your measurement, since if your measurement is too exponential (the whole in the part) the hill climber will come up with silly things.
User avatar
Jarlve
 
Posts: 2544
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: Unigram distance curiosity

Postby smokie treats » Sun Oct 22, 2017 4:39 pm

Deleted.
User avatar
smokie treats
 
Posts: 1620
Joined: Thu Feb 19, 2015 1:34 pm
Location: Lawrence, Kansas

Re: Unigram distance curiosity

Postby doranchak » Tue Nov 07, 2017 4:28 pm

Jarlve wrote:In the 340 there are very few unigram repeats over short distances, especially when taking in consideration its higher ioc per cipher length than the 408. This is why we see 9 rows which have no repeats, it is not easily connected to transposition after/during encoding and typical encoding randomization.

To show how dramatic this effect is, let's revisit shuffle tests.
Z408 has 6 rows with no unigram repeats. A test with 10 million shuffles reveals that about 1 in 400 shuffles of Z408 have 6 or more rows with no unigram repeats.
For Z340, only about 1 in 1.5 million shuffles have 9 or more rows which have no repeats.
Pretty strong observation, combined with the unique sequence peak of 26 at length 17 you discovered.

I did another test comparing unigrams in rows and columns. The idea is this:
Take a symbol, then count how many columns it is in. For example, there are three P symbols but they appear in only two columns.
Make these counts for every symbol, then the final measurement is the sum of the counts.

Do the same measurement, but count how many rows the symbols are in instead of columns. Then compare to 1,000,000 shuffles.

The results for Z408 are:
By row: 380. This is 4.5 standard deviations above the mean of shuffles. No shuffle exceeded a score of 379.
By column: 337. This is 0.8 standard deviations above the mean of shuffles. About 1 in 4 shuffles met or exceeded this score.

For Z340:
By row: 322. This is 5.5 standard deviations above the mean of shuffles. No shuffle exceeded a score of 316.
By column: 284. This is 0.5 standard deviations below the mean of shuffles. About 10 in 14 shuffles met or exceeded this score.

So, instances of the same symbol tend to be spread out across rows in Z408 and Z340, and a bit more so for Z340.
They do not show this tendency across columns.

Also, I noticed that some low-frequency symbols seem to repeat along columns, against expectations. I mentioned this here: viewtopic.php?p=55921#p55921
But initial tests suggest it may just be a phantom effect. I still have to confirm this.
User avatar
doranchak
 
Posts: 2360
Joined: Thu Mar 28, 2013 5:26 am

PreviousNext

Return to Zodiac Cipher Mailings & Discussion

Who is online

Users browsing this forum: letega, tGkTcy2W9B4p60o and 33 guests

cron