Page 7 of 19

Re: CIPHER STRUCTURE

PostPosted: Wed Jun 03, 2015 8:00 am
by smokie treats
Experiment 1 J-ST

Yours is different from the 340 when I sort the two symbol comparisons by score. It's just that yours lines up so perfectly and it is easy to see what the cycles are:

Experiment1C.png


Z340 Scores 60% and up:

Experiment1D.png


See what I mean about patterns? That's part of why I was so convinced that he used cycles.

S.T.

Re: CIPHER STRUCTURE

PostPosted: Wed Jun 03, 2015 9:31 am
by smokie treats
Experiment 2 J-ST

Experiment2.png


That took about an hour. You wanted me to find Symbols 16, 17, 27 and 32.

Note that if two symbols have a cyclical relationship, then their count should be roughly the same. All of the symbols with a count of 7, for example, would have a good chance of being cyclically related to each other and also the symbols with a count of 5 and 6 because a cycle can get cut off at the end of the message.

Symbol 16 only had two scores above 50%, which was a red flag because the shift in the two symbol distribution comparison is at 50%.

I worked my way up from the bottom of the table, and stopped at 16. I think that's what you wanted me to find.

Next time just a grid of numbers in ZKDecrypto format.

S.T.

P.S. I should also work the 340 again on the new spreadsheet in the near future.

Re: CIPHER STRUCTURE

PostPosted: Wed Jun 03, 2015 11:59 am
by doranchak
I don't know if this helps, but here is some more evidence that Z340 contains intentional cycles.

First, I updated my cycle search so it only counts sequences that are contiguous. For example, ABABAB will count as 3 repetitions but ABAAAB will only count as 1.

This spreadsheet shows all the cycles found, ordered by a probability score: https://docs.google.com/spreadsheets/d/ ... sp=sharing

There are 3 ciphers in there: Z340, Z408 and R340. R340 is a shuffled Z340. The tabs at the bottom let you pick different cycle lengths and ciphers.

The score that I'm using is a simple relative probability, done by looking at symbol counts. For example, consider a cycle ABABAB. Let "a" be the number of times "A" appears in the cipher, and "b" be the number of times "B" appears. Then, the probability of selecting A and B is (a/340)*(b/340). Since ABABAB has 3 contiguous repetitions, the full probability becomes (a/340)*(b/340)*(a/340)*(b/340)*(a/340)*(b/340). The thinking behind this approach is that there may be another cycle, CDCDCD. They have the same length, but what if there are more C's and D's in the cipher text than A's and B's? Then, the probability score will rank ABABAB above CDCDCD.

Another column of the spreadsheet is "Coverage", which I think is equivalent to smokie's percentage score.

I plotted the sorted probabilities for L=2 for Z340, R340, and Z408:

Image

Y axis is the probability score, and X axis is just the data point number. You'll see that at the left, all 3 ciphers have cycles with very low probabilities. But as you go to the right, you'll see that the probabilities go up, first for the shuffled cipher (r340), then for z340, then for z408.

So, the shuffled cipher (r340) seems to have far more cycles with greater probabilities than the cycles of the original cipher (z340). I still need to compare the distributions of cycle probabilities with more shuffled ciphers, but it sure does seem like z340 has intentional cycles in it.

Re: CIPHER STRUCTURE

PostPosted: Wed Jun 03, 2015 3:12 pm
by smokie treats
Thank you very much for the spreadsheets.

I am sure that we will be able to learn a great deal about the Z340 by examining the information.

S.T.

Re: CIPHER STRUCTURE

PostPosted: Wed Jun 03, 2015 4:14 pm
by Jarlve
Thanks for the information on probabilities doranchak.

Well done smokie, both ciphers are correctly identified. I wasn't sure how the interactions between cycles would play out.

I now introduced various degrees of randomness to the majority of the cycles. Though it could still be considered cyclic, but probably on the weaker side. I'm wondering if you could identify any 1:1 substitutes here. Given the randomness here and there it may be very hard. Would also like to know how you well you think this cipher compares to the 340 and if you could give reason to suspect any of them to be wildcards. I'm interested to know if the introduced randomness + 1:1 substitutes could simulate the wildcard hypothesis.

Good luck :D

Code: Select all
-----------------
Nummeric cipher for ZKDecrypto:
-----------------
1   2   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16 
17  18  17  19  20  21  22  23  4   24  25  26  27  28  29  30  31 
32  33  11  18  34  29  11  12  13  35  36  10  37  10  24  21  33 
2   15  38  39  40  41  21  42  9   43  18  44  31  29  26  32  45 
34  46  47  28  44  3   36  48  20  40  49  10  50  10  7   20  15 
34  21  51  46  42  11  33  30  4   1   29  6   27  16  8   45  32 
33  25  19  36  31  11  44  43  23  34  35  39  5   12  18  52  20 
9   32  53  54  2   34  38  42  55  14  46  43  56  57  48  2   41 
24  3   39  48  13  47  58  52  5   17  11  15  27  33  18  44  22 
28  6   36  34  31  26  10  57  40  18  44  17  30  29  1   59  40 
32  16  60  49  10  19  23  22  7   21  36  12  31  11  47  61  31 
18  44  24  13  28  27  25  35  36  42  49  18  4   5   46  9   33 
56  42  11  38  36  17  29  36  34  7   32  54  42  49  15  12  49 
62  11  53  43  18  20  59  7   48  51  41  16  48  1   3   29  58 
33  6   19  39  48  49  10  8   25  42  23  4   46  42  5   21  36 
22  26  36  37  10  56  24  28  60  35  32  38  39  5   11  33  55 
34  49  18  14  41  33  7   43  2   31  28  27  18  34  32  13  3   
4   10  11  48  36  10  21  25  5   11  6   2   42  29  21  17  19 
39  47  15  31  5   28  44  30  40  57  56  34  44  59  2   51  62 
1   18  44  63  22  57  52  11  4   22  53  8   28  44  23  36  12

Re: CIPHER STRUCTURE

PostPosted: Wed Jun 03, 2015 6:40 pm
by smokie treats
Jarvle: This is an excellent project and may take a little time. I will try to flush out the high count symbols that do not have strong cyclical relationships with other symbols and compare to Z340. The project is also making me think about my prior attempt at grouping the symbols into categories and how to do that in different ways, including some type of graphic. It's all trial and error, which I enjoy.

doranchak: Your spreadsheets are excellent and what I have been hoping for. I have been thinking about methods for scoring the cycles, and have some comments.

What I was trying to do with my scoring system was reward both total length and purity for purposes of flushing out the most likely candidates for Zodiac made cycles. For instance, ABABABAB, eight symbols in count, would have the same score as ABCDABCD, a 75% because six of the eight symbols in each (not including the first and last symbol) have a symbol to both the left and right of it that are the symbol that it should be if in the cycle. The logic being that short cycles such as ABAB would only score 50% and therefore not be of much use to me, since those could easily be random. A cycle ABABAB*B would have a score of 5/8=63%, which takes into account the missing symbol and makes the cycle rank lower on my sorted list of scored cycles; I would prefer to work with long perfect cycles if possible.

However... this is an artificial way to score the cycles and may not show their relative strengths; I have no idea if ABABABAB is any more or less likely to occur at random than ABCDABCD.

I grasp the concept of your scoring system:

The score that I'm using is a simple relative probability, done by looking at symbol counts. For example, consider a cycle ABABAB. Let "a" be the number of times "A" appears in the cipher, and "b" be the number of times "B" appears. Then, the probability of selecting A and B is (a/340)*(b/340). Since ABABAB has 3 contiguous repetitions, the full probability becomes (a/340)*(b/340)*(a/340)*(b/340)*(a/340)*(b/340). The thinking behind this approach is that there may be another cycle, CDCDCD. They have the same length, but what if there are more C's and D's in the cipher text than A's and B's? Then, the probability score will rank ABABAB above CDCDCD.


Let's say that A and B each appear three times in the message with ABABAB. I get 4.71904E-13. Now if C and D each appear four times with CDCDCDCD, I get 3.66985E-16. Does ABABAB score higher or lower than CDCDCDCD? CDCDCDCD would rank higher, even though the actual score number is lower, correct?

I am not sure that only ranking the cycle strings that have two contiguous perfect cycles will show all of the most likely candidates for Zodiac made. For instance, ABA*ABABABABAB would not show up at the top of your spreadsheet, but it would score higher with other methods.

I have another idea that takes from your shuffle idea.

Let's say we identify a handful of common patterns and find the mean number of random shuffles out of 30 trials to get that pattern to appear. Then we would might have a more relevant way to measure how the cycles compare to each other.

ABABABC 10 random shuffles
ABCDABCD 100 random shuffles
ABCABCABC 200 random shuffles

Something like that. Real simple. A little table of common patterns to give us some perspective.

Memories in my old brain from statistics class tell me that statisticians often take 30 random samples from a population to compare with a population under study. But that example is not exactly like what we are dealing with here. Just thinking that more than 30 shuffles may not change the results all that much. Just thinking that this might be the most practical approach and eliminates trying to make any artificial scoring system that may not be the best one.

Another thought: I enjoy working with the two of you. I just hope that we are working smart instead of working hard. You guys know how to solve these types of problems with computers, and have powerful programs. I am wondering if either of you can think of ways to speed up finding a solution with what we have already learned or could observe from looking over the top scoring cycles that doranchak has identified.

Thanks, and I will work on Experiment 3 J-ST and think of another way to categorize the symbol groups with my currently artificial but not necessarily most relevant scoring system.

S.T.

Re: CIPHER STRUCTURE

PostPosted: Thu Jun 04, 2015 6:02 am
by Jarlve
smokie treats wrote:Jarvle: This is an excellent project and may take a little time. I will try to flush out the high count symbols that do not have strong cyclical relationships with other symbols and compare to Z340. The project is also making me think about my prior attempt at grouping the symbols into categories and how to do that in different ways, including some type of graphic. It's all trial and error, which I enjoy.


I also believe in trial and error, we humans basicly do hill climb our problems to a solution. That's why (most of us) are not calculators. Here are 100 randomised/shuffled 340's in numbers format each time renumbered by appearance: download from google.

smokie treats wrote:Another thought: I enjoy working with the two of you. I just hope that we are working smart instead of working hard. You guys know how to solve these types of problems with computers, and have powerful programs. I am wondering if either of you can think of ways to speed up finding a solution with what we have already learned or could observe from looking over the top scoring cycles that doranchak has identified.


I have an interesting idea. You guys have been looking at probabilities of individual cycles (well mostly). I propose looking at the probabilities of a full distribution of cycles with the goal to reduce the number of symbols from 63 to somewhere 20-26. A hill climber that sorts out the number of letters and the cycle per letter.

Key1: number of letters.
Key2: symbols (part of cycle) per letter.
Operations: a) increment/decrement letter, b) swap symbols.
Measurement system has to be such that the total score of the actual cycles in the cipher is the global optimum.

There may be some problems with this system, it may or not have multiplicity issues. If some transposition was applied after or during encoding of the 340 then it will fail. If the wildcard hypothesis is true then it may need adjustement. It may have problems when randomisation of the cycles is actual.

Doranchak do you think it is possible to come up with such a measurement system? If you think it's possible and you see some merit in this approach by all means feel free to try it. If you think it's unprobable or don't have the time to try this approach could you point me in the right direction? Thank you.

Re: CIPHER STRUCTURE

PostPosted: Thu Jun 04, 2015 8:44 am
by doranchak
smokie treats wrote:What I was trying to do with my scoring system was reward both total length and purity for purposes of flushing out the most likely candidates for Zodiac made cycles. For instance, ABABABAB, eight symbols in count, would have the same score as ABCDABCD, a 75% because six of the eight symbols in each (not including the first and last symbol) have a symbol to both the left and right of it that are the symbol that it should be if in the cycle. The logic being that short cycles such as ABAB would only score 50% and therefore not be of much use to me, since those could easily be random. A cycle ABABAB*B would have a score of 5/8=63%, which takes into account the missing symbol and makes the cycle rank lower on my sorted list of scored cycles; I would prefer to work with long perfect cycles if possible.

However... this is an artificial way to score the cycles and may not show their relative strengths; I have no idea if ABABABAB is any more or less likely to occur at random than ABCDABCD.


We can compute that directly from a very simple example, and then build from it. Let's consider a cipher with the alphabet {A,B,C,D}. To support the appearance of the cycles ABABABAB and ABCDABCD, the minimum frequency counts for the alphabet have to be: {4,4,2,2}. So, to use all those symbols, let's start with a cipher length of 4+4+2+2 = 12.

Since our "mini cipher" has length 12, and ABABABAB is of length 8, the cycle ABABABAB can appear in 5 different spots:

ABABABABAB****
*ABABABABAB***
**ABABABABAB**
***ABABABABAB*
****ABABABABAB

Since we've used up all the A's and B's in those 5 possibilities, the remaining 4 spots are filled with C's and D's. There are 6 ways for those to appear: CCDD, CDCD, CDDC, DCCD, DCDC, and DDCC. That means that ABABABAB appears in 5*6 = 30 of all the possible cipher configurations.

By the same logic, the cycle BABABABA also appears in 30 of all possible cipher configurations. CDCDCDCD cannot be formed because we don't have enough Cs and Ds.

So, there are 2*30 = 60 possible cipher configurations that would contain ABABABAB or BABABABA.

Now let's look at ABCDABCD. It, too, can appear in 5 different spots. Then, the available remaining symbols are 2 As, 2 Bs. With the same reasoning as above, there are 6 ways for the remaining As and Bs to appear, so ABCDABCD appears in 30 of all possible cipher configurations.

But we would also notice cycles such as DCBADCBA, right? So we need to count those as well. But now we realize there are many more possibilities (24 of them):

ABCDABCD, ABDCABDC, ACBDACBD, ACDBACDB, ADBCADBC, ADCBADCB, BACDBACD, BADCBADC, BCADBCAD, BCDABCDA, BDACBDAC, BDCABDCA, CABDCABD, CADBCADB, CBADCBAD, CBDACBDA, CDABCDAB, CDBACDBA, DABCDABC, DACBDACB, DBACDBAC, DBCADBCA, DCABDCAB, DCBADCBA

So, there are 24*30 = 720 possible cipher configurations that would contain this kind of cycle pattern. Note that 720/60 = 12. There are 12 times as many patterns of the form ABCDABCD as patterns of the form ABABABAB.

That means ABABABAB is much less likely to occur by chance than ABCDABCD, provided that you are considering all the other forms of the patterns.

Does this make sense?

To verify my analysis, I wrote a quick program to generate all the unique cipher texts from the alphabet described above. Here are the outputs: http://zodiackillerciphers.com/mini-cipher.txt

There are 207,900 unique cipher texts from that alphabet. The probability of ABABABAB or BABABABA occurring by chance is 60/207900 = 0.03%. The probability of ABCDABCD (or one of the other 23 pattern types like that) occurring by chance is 720/207900 = 0.35%.

Of course, these probabilities change as we adjust the frequencies of the symbols in the cipher alphabet. I think it would be a worthy exercise to compute the exact probabilities based on the cipher length and symbol frequencies, for any cipher text. Then it will give you an exact way to compare different cycles to each other.

Re: CIPHER STRUCTURE

PostPosted: Thu Jun 04, 2015 11:34 am
by smokie treats
Wow, this is getting really intense. Jarlve, I have been working on Experiment 3 J-ST.

Experiment3e.png


I. Search for 1:1 or wildcards

A. First, I checked all symbols with a count of 5 or more and which had few scores above 50%. Not an exact science, but looked there first. Strong candidates for 1:1 or wildcard are 10 and 2. Symbol 2, however does have some cycling with 28 which is either random or caused by multiple deletions or substitutions in the cycle.

Other possible candidates are 11, 36, 21, 29, 48 and 49. However, all cycle with other symbols.

Marginal candidates are 17 and 49, which also cycle with other symbols.

B. Only one symbol with a count of 6 or more had perfect cycles, but there are many symbols with counts of 6 or more that score high. This suggests cycles that include symbols with a count of 6 or higher, but have missing symbols.

There are many perfect cycles as well, especially with symbols that have a count of 5 or less. The distribution below shows this. Green is Experiment 3 J-ST and red is Z340. There are more higher scoring or perfect cycles than Z340, but it is similar.

Experiment3a.png


Because there are definitely a lot of cycles, this supports the theory that 2 is a 1:1 or wildcard. Symbol 2 is high count, but if 2 was in a cycle, it would not be in a side by side condition as seen in row 1. If 22 was a wildcard, however, then 22 would have to start a cycle as a wildcard because 22 sits next to 1, and 1 is in a cycle with 23, 35, 38, and 41.

Re: CIPHER STRUCTURE

PostPosted: Thu Jun 04, 2015 11:53 am
by smokie treats
II. Examples of High Scoring Cycles with Missing Symbols

A. Symbols 32 and 33 are probably in an intentional cyclical relationship, but have missing symbols:

32 33 * 33 32 33 32 33 32 33 32 33 32 33 32 33 * 33 32

Symbol 32 does have some cycling with 5, 11, 39 and 43, but if any of those are added to the 32 33 cycle, I don't see much of a pattern.

In the message below, I show that some of the suggested 1:1 substitutes are there where the missing 32 should be found. The A represents 32 and the B represents 33. A=32 and B=33.

Experiment3b.png


B. Symbols 7, 25, 35 and 38 are probably in an intentional cycle, but have three contiguous missing symbols that could be replaced with wildcards:

7 25 35 38 7 25 35 38 7 25 35 38 7 * * * 7 25 35 38 7 25

A=7, B=25, C=35 and D=38

Experiment3c.png


I am really curious to know if you did this on purpose, or perhaps one of the symbols falls into the cycle by chance. Or is all of this a coincidence?

C. Symbols 9, 16, 19 and 23 are probably in an intentional cycle, but have two contiguous missing symbols that could be replaced with wildcards:

9 16 19 23 9 16 19 23 9 16 19 23 9 16 19 23 * * 19 23

Experiment3D.png


How are those for examples? Did you create any of those cycles? I could look for more but am curious if I am finding random cycles or not.

S.T.