Page 6 of 13

Re: Unigram distance curiosity

PostPosted: Sat Nov 18, 2017 5:37 pm
by smokie treats
Here is a better visual, if you can see it:

https://drive.google.com/drive/folders/ ... vJv7urWX7e

EDIT: I could see it. The thin blue line shows the regions. More comparisons coming soon.

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 3:46 am
by Jarlve
The visuals are fine. Do you think something special is going on besides 6-8-6?

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 8:18 am
by smokie treats
Jarlve wrote:The visuals are fine. Do you think something special is going on besides 6-8-6?


Yes I do. I believe that 6-8-6 is just a glimpse into something more. Here is another slideshow. I chose three messages from the experiment at random, all transposition P20. I changed the color scheme to make it easier to focus on the issue. Watch the progression.

1-18-1 similar
2-16-2 similar
3-14-3 similar
4-12-4 there is a shift, with more symbol positions unique to both regions
5-10-5 more in top bottom
6-8-6 more in top bottom
7-6-7 more in top bottom
8-4-8 more in top bottom
9-2-9 similar

https://drive.google.com/drive/folders/ ... wds7LLjwLz

You were right about the middle in 6-8-6. I had a typo in one of my formulas and will be going back to update the data a few posts above shortly. Just a few symbol positions in the middle is very common.

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 9:40 am
by smokie treats
Here is another example, but three randomly selected messages from the experiment not transposed. The comparisons are about the same.

1-18-1 similar
2-16-2 similar
3-14-3 similar
4-12-4 there is a shift, with more symbol positions unique to both regions
5-10-5 more in top bottom
6-8-6 more in top bottom
7-6-7 more in top bottom
8-4-8 more in top bottom
9-2-9 similar

https://drive.google.com/drive/folders/ ... 1MYD8jSofY

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 12:24 pm
by Jarlve
smokie treats wrote:
Jarlve wrote:The visuals are fine. Do you think something special is going on besides 6-8-6?

Yes I do. I believe that 6-8-6 is just a glimpse into something more. Here is another slideshow. I chose three messages from the experiment at random, all transposition P20. I changed the color scheme to make it easier to focus on the issue. Watch the progression.

Here are my results versus a randomized plaintext + sequential homophonic substitution with 26% random homophone selection hypothesis. They line up with yours. Row divisions 4-12-4 and 6-8-6 are most significant, that is roughly by thirds. Though, I currently see no reason to believe there is more to it. Can you formulate "something more" yet?

Code: Select all
Row division: top-bottom unique unigram count, sigma
--------------------------
1: 0, -0.17
2: 0, -0.38
3: 4, 2.04
4: 15, 5.39 <---
5: 20, 4.20
6: 39, 5.15
7: 43, 2.32
8: 94, 3.14
9: 147, 0.88

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 12:30 pm
by Jarlve
And versus randomizations of the 340:

Code: Select all
Row division: top-bottom unique unigram count, sigma
--------------------------
1: 0, -0.20
2: 0, -0.45
3: 4, 1.02
4: 15, 2.93 <---
5: 20, 1.89
6: 39, 2.37
7: 43, 0.30
8: 94, 1.24
9: 147, -0.22

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 3:02 pm
by smokie treats
He didn't make a key, then cycle through his symbols, and +/- 25% of the time randomly select from the homophone groups. A person could match the L=2 cycle scores by doing that, but that is not what he did because if he did that there would not be the regional biases that we see. I think that the regional biases are a clue to how he encoded the messages, one that we can actually see.

He either selected the symbols from his key in some way by position or row, or used some type of creative cycle. Because the top and bottom parts have the same symbols but those symbols are not in the middle.

One possibility is that he gradually added symbols from the beginning to the middle, then gradually removed symbols from the middle to the end.

Or sounding more like what we see, vice versa, gradually removing symbols from the key from the beginning to the middle, then gradually adding the removed symbols from the middle to the end.

He could have had a key made into a grid, and selected by row, moving to one side of the grid toward the middle, then moving from the middle to the other side of the grid.

Or, maybe a cycle that overlaps itself, sort of like:

A B C D
B C D A
C D A B
D A B C
A B C D

Rough ideas.

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 4:46 pm
by doranchak
smokie treats wrote:Left column is count of symbols unique to both top 6 rows and bottom 6 rows, right column is number of messages with that count. The 340 has 39 symbol-positions. None of my messages had 39, and the most was one message that had 34. The average was only 12.

This regional bias for certain symbols continues to be interesting. I have tried to confirm the significance of the observation via one million randomizations of Z340.

Z340 has 10 symbols that occur only in the first or last six lines (Yes, I love showing this font! :D ):

Screen Shot 2017-11-19 at 4.43.41 PM.png


This is a 0.83 sigma observation compared to randomizations. About 1 in 4 shuffles has at least that many symbols exclusive to those lines.

The 10 exclusive symbols occupy 39 positions of Z340. This is a 1.66 observation compared to randomizations. About 1 in 16 shuffles had at least that many positions occupied by symbols exclusive to those lines.

I am curious about other regions that show symbol exclusivity, such as the middle 8. I am going to try to run a brute force search for exclusive symbol counts for all selections of rows. Then hopefully compare to a similar search using columns.

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 5:38 pm
by doranchak
doranchak wrote:I am curious about other regions that show symbol exclusivity, such as the middle 8. I am going to try to run a brute force search for exclusive symbol counts for all selections of rows.

Here are the full raw results; I haven't organized them to summarize the interesting bits.

http://zodiackillerciphers.com/z340-exc ... ctions.txt

(Warning: it's a 54 megabyte text file)

To compare against the known result for the 12 rows (6 at the beginning and 6 at the end), here is a sampling of some of the best selections of rows.

Code: Select all
n   rows   symbols   count   positions
12   [0, 1, 2, 4, 5, 7, 10, 14, 16, 17, 18, 19]   )/1:>@CDEHPWY_jk   16   57
12   [0, 1, 2, 4, 6, 7, 10, 12, 14, 16, 17, 18]   #%)/19:@ELPWXY_j   16   54
12   [0, 1, 2, 4, 6, 7, 10, 11, 14, 16, 17, 18]   %&)/1:@ELPTWXY_j   16   52
12   [1, 2, 4, 6, 7, 8, 11, 12, 13, 14, 15, 19]   #%&.49:;@ANXZfjq   16   52
12   [0, 1, 2, 4, 6, 7, 8, 12, 14, 16, 17, 19]   #%)169:;@AHLPXZj   16   51
12   [0, 1, 2, 4, 5, 8, 10, 14, 16, 17, 18, 19]   )/1:>ACDEHPWZdk   15   58
12   [0, 1, 2, 4, 5, 7, 8, 14, 16, 17, 18, 19]   )1:>@ACHPWZ_djk   15   54
12   [0, 1, 2, 4, 5, 6, 7, 14, 16, 17, 18, 19]   %)1:>@CHLPWX_jk   15   53
12   [1, 4, 6, 7, 8, 10, 11, 12, 13, 14, 15, 18]   %&.459@XY_bfjqt   15   52
12   [2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19]   &459:;@AJUXZbjy   15   52
12   [0, 1, 2, 4, 6, 7, 12, 14, 16, 17, 18, 19]   #%)19:;@HLPWX_j   15   51
12   [0, 2, 4, 5, 6, 7, 8, 14, 16, 17, 18, 19]   1:>@ACHLPXZ_djk   15   51
12   [1, 4, 6, 7, 10, 11, 12, 13, 14, 15, 18, 19]   %&.59;@NXY_fjqt   15   51
12   [1, 4, 6, 7, 8, 10, 11, 12, 13, 14, 15, 19]   %&.459;@ANXbfjq   15   51
12   [0, 1, 2, 4, 5, 6, 7, 10, 14, 16, 17, 18]   %)/1:@CELPWXY_j   15   50
12   [0, 1, 2, 4, 6, 7, 9, 10, 14, 16, 17, 18]   %)/1:@ELPUWXY_j   15   50
12   [0, 1, 2, 6, 7, 8, 12, 13, 14, 15, 16, 19]   #%16:;@AGLNXZfq   15   50
12   [0, 2, 4, 5, 7, 8, 12, 14, 16, 17, 18, 19]   169:>@ACHPZ_djk   15   50
12   [1, 2, 4, 6, 7, 10, 11, 12, 13, 14, 15, 18]   #%&.59:@XY_fjqt   15   50
12   [1, 2, 4, 6, 7, 8, 10, 11, 12, 13, 14, 19]   #%&459:;@ANXZbj   15   50
12   [1, 4, 6, 7, 8, 10, 11, 12, 13, 14, 18, 19]   %&459;@ANXY_bjt   15   50
12   [0, 1, 2, 4, 5, 7, 8, 10, 14, 17, 18, 19]   /:>@ADEHPYZ_djk   15   49
12   [0, 1, 2, 4, 6, 7, 10, 13, 14, 16, 17, 18]   %)/1:@ELPWXY_jt   15   49
12   [0, 1, 2, 4, 6, 7, 10, 14, 16, 17, 18, 19]   %)/1:@EHLPWXY_j   15   49
12   [0, 2, 4, 5, 6, 7, 8, 12, 14, 16, 18, 19]   169:;>@ALXZ_djk   15   49
12   [0, 2, 4, 5, 7, 8, 10, 14, 16, 17, 18, 19]   /1:>@ACEHPZ_djk   15   49
12   [1, 2, 4, 6, 7, 8, 9, 11, 12, 13, 14, 19]   #%&49:;@AJNXZjy   15   49
12   [0, 1, 2, 4, 6, 7, 10, 11, 12, 14, 16, 17]   #%&)/19:@ELPTXj   15   48
12   [0, 1, 2, 4, 6, 7, 10, 12, 14, 16, 17, 19]   #%)/19:;@EHLPXj   15   48
12   [0, 1, 2, 4, 6, 7, 9, 10, 11, 14, 16, 17]   %&)/1:@EJLPTUXj   15   48
12   [0, 2, 4, 5, 7, 8, 10, 13, 14, 17, 18, 19]   /:>@AEHPZ_bdjkt   15   48
12   [0, 1, 2, 4, 6, 7, 10, 14, 15, 16, 17, 18]   %)/1:@ELPWXY_jq   15   47
12   [1, 2, 4, 6, 7, 8, 10, 12, 13, 14, 18, 19]   #%9:;@ANXYZ_bjt   15   46
12   [0, 2, 4, 6, 7, 8, 10, 12, 14, 16, 17, 19]   /169:;@AEHLPXZj   15   45


That may be an example of seeing the tail end of the bell curve for "symbol exclusivity", since I am not restricting row selection to specific patterns of rows.

Re: Unigram distance curiosity

PostPosted: Sun Nov 19, 2017 10:37 pm
by moonrock
Jarlve wrote:Moonrock, out of all his cycle types considered regional cycling for the 340.

My hypothesis was that the 340 cipher used a combination of regional cycles and semi-regional cycles for the nine most common English plaintext letters and likely more typical methods thereafter due to the less frequent letters not having as many substitutions to work with. This is evidenced by (1) the combined frequency of all ciphertext letters that appear regionally and semi-regionally roughly matching the combined frequency of the nine most common English plaintext letters, and (2) ciphertext letters M and backward L alternating with each other perfectly for the entire cipher and having a combined frequency matching English plaintext letters D and L, which are the 10th and 11th most common English plaintext letters, having a similar frequency and not being close to other letters in frequency, thus acting as a barrier between the high frequency letters and the other letters of the alphabet.

I created multiple test ciphers to test this after someone suggested to do so and found that it was easy to manually produce ciphers that have similar statistical characteristics to the 340 cipher. However, all of these ciphers were easily decipherable. That implies that if my work is correct, then likely a transposition method was used before homophonic substitution and that that is the reason why the 340 cipher hasn't been deciphered.

Something worth mentioning about regional and semi-regional cycles is that you can get a good guess of what plaintext letters a ciphertext letter might be substituting by looking at its frequency in areas where it does occur (in the case of regional cycles) or where its frequency is high (in the case of semi-regional cycles). These are only estimations, but consider ciphertext symbol W, which is limited to two areas of the ciphertext and has a high frequency in both areas. In that case, it isn't a stretch to assume that W is substituting a high frequency letter.

Another way to approach these two cycles is that ciphertext letters that behave inversely may be substituting the same plaintext letter whereas ciphertext letters that behave the same are unlikely to be the same plaintext letter, meaning that W is unlikely to be substituting the same letter as the circle with a horizontal line through it since these two symbols both only occur in the same lines as each other. It is more likely then that W is grouped with ciphertext letters that either don't occur where it occurs or occur at a relatively low frequency where it occurs.