Page 2 of 2

Re: Recipes for primephobia

PostPosted: Sat Dec 12, 2015 3:30 am
by Jarlve
_pi wrote:One way I found to channel certain symbols into prime-safe columns, as I described in my original post, without involving a poly-alphabetic approach, is to assign a homophonic symbol to a letter based on its position in the plaintext. This way, you again find the cyclical component required to induce this prime-safe effect.

Interesting, could you make an example cipher?

Re: Recipes for primephobia

PostPosted: Sat Dec 12, 2015 4:30 pm
by _pi
Jarlve wrote:Interesting, could you make an example cipher?


Code: Select all
cDBPbqcDDBs8u+STh
+4+k3yWaBCBxxtQy5
o7ysBCdW2SU+mXsC9
3sqcDDBP8ZBDKM3r+
Bsf9amSUU+xC4+J3y
x+rHscxC9brtxClHR
n+USXxiPBrHDtL3Dh
CtqBDhxtQ+C9BRnnB
Y+x2+Co+2SWCfNUBD
DdsM+EVbUaP5+dCcW
+zbs4+CCbUe93RM+e
CBsnFtyUwO5qxSmLA
cCo38BUDC9b4axCTH
weSmBfBxCo3fZo+sB
6BaBZBhDI+w+ISUsB
Ru3w3lBk+is63hDe9
+B93Y+qBhDa6ABhD4
+JS2+rFWDHYaWBZdD
gPSCnBY+1SXrFP3Qb
jaJ3XW+FSXAdDgCUF
fSWDSZlSZRSwxCtu2
1JSgD+JfBsnSmWDiz
+xLSUQ137CaVDBL+a
j+SUBaC+rbCNoVBfB


I re-encoded the z408's plaintext using this scheme. The resulting cipher is easily solvable in a few seconds in zkdecrypto.

This homophonic cipher is therefore of length 408. It uses 59 different symbols. The most frequent symbols are B, +, C, D, S and 3; together, they occur 148 times, covering 36% of the cipher. Of all of these 148 occurrences, only 2 are primes, making these symbols 99% prime phobic.

The encoding scheme, in more details, is the following:

For the following letters, choose the symbol whose position in the list cyclically corresponds to the plaintext's letter's position in the cipher:
Code: Select all
Letter: Symbols
---------------
A:  i 3 3 3 H 3
E:  a + + + b +
I:  c B B B d B
L:  g D D D h D
O:  t S S S O S
T:  e C C C f C

In other words, if X is the position of the letter in the cipher, pick the symbol at position X % 6.

For all the other plaintext letters, randomly choose a symbol from the list allocated to that letter.

This is obviously an exaggerated example but it demonstrates that, under the right conditions, using such a cyclical way of picking symbols when constructing a homophonic substitution cipher, the resulting cipher could exhibit prime phobia for 1 or more frequent symbols.

_pi

Re: Recipes for primephobia

PostPosted: Sun Dec 13, 2015 2:29 am
by Jarlve
Thanks _pi.

This is obviously an exaggerated example but it demonstrates that, under the right conditions, using such a cyclical way of picking symbols when constructing a homophonic substitution cipher, the resulting cipher could exhibit prime phobia for 1 or more frequent symbols.

Yes, I think your cipher is about 4 times as prime phobic as the 340. There is also a huge odds/even discrepancy which can be noted for the 340 also (though not to this extent). I tested your cipher with my m_s2_cycles measurement for homophone sequences and it scores below average so the cycles are fully randomized.

That's really interesting because that correlates with the 340 to some degree. I wonder if a similar table was used, where one non-prime column is selected as a polyalphabetic. This could explain some of the randomization and the prime phobia!

Code: Select all
Letter: Symbols
---------------
A:  i 3 3 3 H 3
E:  a + + + b +
I:  c B B B d B
L:  g D D D h D
O:  t S S S O S
T:  e C C C f C

I also wonder if it could relate to this: http://cypherpunks.venona.com/date/1993 ... 00354.html Although I don't understand how to apply the encoding, which is described as second order homophonic substitution.

Re: Recipes for primephobia

PostPosted: Sun Dec 13, 2015 5:18 am
by Jarlve
I made a measurement for this. For each set of symbols sum all the modulo frequencies using c*(c-1) and divide that number by the frequency of the symbol. It seems to work perfectly! It filters out the "prime" suspects in the 340, 408 and pi2 ciphers. Sorted by score, the field on the right is the score.

Code: Select all
Stats for: 340.txt
-----------------
Symbol number by appearance, ASCII symbol, frequency, score.
-----------------
19            +             24            21.83
20            B             12            12.5
16            2             9             8.88
3             R             8             8
56            C             5             6.4
8             V             6             5.66
51            F             10            5.6
23            O             10            5.4
28            .             6             5.33
22            #             5             5.2
21            (             7             5.14
50            5             7             5.14
29            <             6             5
39            Z             4             5
6             l             7             4.85
42            S             4             4.5
5             p             11            4
7             ^             6             4
18            N             5             4
10            k             5             3.6
48            t             4             3.5
36            c             10            3.2
31            K             7             3.14
40            z             9             3.11
4             >             4             3
62            A             2             3
33            )             5             2.8
12            1             3             2
43            7             3             2
47            9             4             2
49            j             2             2
53            4             6             2
61            b             3             2
55            -             5             1.6
44            8             4             1.5
11            |             10            1.4
14            T             5             1.2
17            d             5             1.2
26            W             6             1
27            Y             4             1
30            *             6             1
32            f             4             1
52            &             2             1
2             E             3             0.66
9             P             3             0.66
46            _             3             0.66
54            /             3             0.66
58            ;             3             0.66
37            M             7             0.57
25            D             4             0.5
41            J             4             0.5
38            U             5             0.4
15            G             6             0.33
1             H             4             0
13            L             6             0
24            %             2             0
34            y             5             0
35            :             2             0
45            3             2             0
57            q             2             0
59            X             2             0
60            @             1             0
63            6             3             0

Code: Select all
Stats for: 408.txt
-----------------
Symbol number by appearance, ASCII symbol, frequency, score.
-----------------
26            q             16            16.37
6             U             10            11
7             B             12            9.83
14            W             9             9.33
2             %             11            8.18
1             9             14            8
36            D             6             8
29            ^             6             7.66
32            T             7             6
15            V             9             5.55
11            =             7             5.42
3             P             11            5.27
8             k             9             4.88
22            H             8             4.5
50            8             8             4.25
23            @             6             4
30            I             11            4
33            t             7             4
39            S             6             4
42            A             8             4
45            E             9             4
27            M             8             3.75
4             /             6             3.66
13            X             9             3.55
10            R             12            3.5
21            6             8             3.5
37            5             8             3.5
54            _             8             3.5
46            L             8             3.25
17            e             10            3
9             O             7             2.85
41            #             10            2.8
49            \             5             2.8
16            +             8             2.75
47            d             6             2.66
38            )             8             2.5
5             Z             8             2
44            l             5             2
19            Y             10            1.8
48            r             7             1.71
28            J             6             1.66
12            p             6             1.33
43            f             3             1.33
52            c             6             1.33
24            K             5             1.2
25            !             5             1.2
20            F             6             1
40            (             4             1
35            Q             5             0.8
34            N             6             0.66
18            G             7             0.28
31            7             3             0
51            z             4             0
53            j             1             0

Code: Select all
Stats for: pi2.txt
-----------------
Symbol number by appearance, ASCII symbol, frequency, score.
-----------------
3             B             33            39.03
10            +             33            35.51
2             D             23            30.26
20            C             24            25.91
11            S             20            23.4
16            3             15            16.8
7             s             13            8.15
36            r             7             8
21            x             14            7.85
4             P             6             6.66
29            U             13            6.61
23            Q             4             6
18            W             9             5.55
41            R             5             3.6
32            9             8             3.5
8             8             3             3.33
6             q             5             3.2
51            F             5             2.8
52            w             5             2.8
24            5             3             2.66
30            m             5             2.4
12            T             2             2
59            j             2             2
33            Z             6             1.66
31            X             5             1.6
44            L             4             1.5
13            h             7             1.42
25            o             6             1.33
48            V             3             1.33
54            A             3             1.33
19            a             11            1.27
14            4             5             1.2
27            d             5             1.2
1             c             6             1
5             b             8             1
49            z             2             1
28            2             5             0.8
35            M             3             0.66
58            1             3             0.66
38            J             5             0.4
39            H             5             0.4
42            n             6             0.33
22            t             7             0.28
37            f             7             0.28
9             u             3             0
15            k             2             0
17            y             5             0
26            7             2             0
34            K             1             0
40            l             3             0
43            i             3             0
45            Y             4             0
46            N             2             0
47            E             1             0
50            e             4             0
53            O             1             0
55            6             3             0
56            I             2             0
57            g             3             0

Re: Recipes for primephobia

PostPosted: Sun Dec 13, 2015 8:08 am
by Jarlve
Here's the measurement function that gives a raw score for the full cipher. Requires the input cipher to be numbered by appearance. I named it m_spmf, stands for symbol position modulo frequencies. What else?

:D

Usage for 340: m_smpf(cipher(),340,63,2,170). The last two arguments determines the from-to modulo range. No more than half of the total_symbols needs to be entered for the mod_to value since only 2 numbers can be modulo 170=0 by this length, 170 and 340.

Output:
340: 181
408: 205
pi2: 268

To compare it to other ciphers it's probably best to calculate the percentual difference from the randomized average. I think the 340 is fairly normal in this regard but some symbols stick out.

Code: Select all
function m_spmf(cipher()as short,byval total_symbols as short,byval unique_symbols as short,byval mod_from as short,byval mod_to as short)as double
   dim as integer i,j,k,t
   dim as double score
   dim as short symbols(unique_symbols,100)
   for i=1 to total_symbols
      symbols(cipher(i),0)+=1
      symbols(cipher(i),symbols(cipher(i),0))=i
   next i
   for i=1 to unique_symbols
      for j=mod_from to mod_to
         for k=1 to symbols(i,0)
            if symbols(i,k)mod j=0 then t+=1
         next k
         score+=t*(t-1)/symbols(i,0)
         t=0
      next j
   next i
   return score
end function

Re: Recipes for primephobia

PostPosted: Tue Dec 22, 2015 7:44 pm
by smokie treats
Having both the + and B be prime phobic seems important. Those are two very unique symbols also because they are high count and do not cycle with other symbols well. They could be 1:1 substitutes, nulls, or included in multiple cycles making them look like 1:1 substitutes but really polyalphabetic.

Random shuffles are one way to examine observed statistical phenomenon, but sometimes a relatively simple cipher can surprisingly re-create a statistical phenomenon without any intention to do so.

But one the other hand, perhaps two keys. One for primes and one for non-primes.

Seems fairly labor intensive to do something like that, and it still doesn't explain the one + and one B that land on prime positions.

At some point I am going to have to play with prime phobia. We found that there is an odd - even phenomenon, and that three of the symbols that land on only odd positions cycle with each other. Prime numbers, with the exception of 2, are all odd. So I wonder about making two lists, symbols that only land on primes and symbols that only land on non-primes. Then check those mutually exclusive symbols to find out if some of them cycle with each other.

EDIT: After searching through the massive homophonic substitution thread, I found the citation for the three symbols that land on only odd numbered positions: viewtopic.php?f=81&t=2617&hilit=odd+daikon&start=210. Perhaps the odd even phenomenon and prime phobia phenomenon are related.

Re: Recipes for primephobia

PostPosted: Wed Dec 23, 2015 8:08 pm
by smokie treats
.
I am a little bit intrigued by this. So out the outset I see that a lot of the primes are positioned in "period 18" diagonal rows.

primes.1.cycles.png

Back to that later.

I was wondering if there may have been two keys, one for primes and one for non-primes. No dice, though. I separated the primes and moved them down to the last four rows. Then I checked the cycle scores for rows 1-16 and for rows 17-20 before and after the separation.

340 rows 1-16 cycle score: 44016
Non-primes rows 1-16 score: 29956

340 rows 17-20 cycle score: 2722
Primes rows 17-20 score: 2748

The 340 is more cyclic without separating the primes from the non-primes. So no amazing discovery and probably no two keys.

There is only one symbol that is exclusive to primes, symbol 59. But there are only two of those. But there are, not surprisingly, many symbols exclusive to non-primes. I made that list and then checked the 34 top cycles. No correlation. There aren't any two symbols that are both exclusive to non-primes and which have high cycle scores.

primes.2.cycles.png

But I am pondering whether a period 19 transposition scheme could cause high count symbols to avoid prime locations, given that the primes are often situated in "period 18" diagonal rows...

Are the period 19 and prime phobia phenomenon related? How difficult is it to make a one key cipher that unintentionally causes high count symbols to avoid non-prime symbols? And we need to take a closer look at the other high count symbols and the statistics.

Re: Recipes for primephobia

PostPosted: Wed Dec 23, 2015 9:05 pm
by smokie treats
So here are the symbol counts, prime and non-prime counts, and expected prime and non-prime counts based on the fact that 20% of the symbol positions in the 340 are prime numbers.

primes.3.png

It looks like symbol 19 boxed in red ( the + ) avoids prime locations and that may be statistically significant. There are 24 count of those, only one lands on a prime position, and there should be about 4.8 landing on prime positions.

Symbol 20 boxed in blue ( the B ) lands on a prime position only once, but there should be about 2.4 landing on prime positions. Not that big of a difference.

Other high count symbols, 16 and 36 boxed in blue, have a slightly lower count on prime positions as compared to expected.

High count symbols 5, 11, and 51 boxed in green land on prime positions more often than expected.

Re: Recipes for primephobia

PostPosted: Thu Dec 24, 2015 9:45 pm
by smokie treats
I have an idea for another way to test the statistical significance of the prime phobia phenomenon. To double check the random shuffle statistics.

Let us say for the sake of argument that the + symbol is a 1:1 substitute.

Take a massive text, like a novel or whatever it is that you guys use to create your n-gram tables. Then take a random 340 plaintext sample from the text. Find all of the plaintext that has count of about 24 within the sample. Then find out how many of those plaintext land on prime positions within the sample. Do that a few thousand times and make a bell curve chart. Compare with the + symbol stats and the random shuffle stats.

If the + maps to two, three, or more plaintext, then does that matter for random sample statistic purposes?

Re: Recipes for primephobia

PostPosted: Fri Dec 25, 2015 8:28 pm
by smokie treats
I made a spreadsheet with all of Jarlve's 100 plaintext messages, which can be found here:

viewtopic.php?f=81&t=2435

Then I made it so that I adjust two variables, the minimum count of plaintext in a message and the maximum count of plaintext that land on prime positions.

For example, if I make the minimum count of plaintext 24 and the maximum count that land on primes 1, then here are the results:

There are 542 plaintext that have a count of 24 or more. They are all high frequency plaintext, as can be expected. A, E, H, I, N, O, R, S and T. And there are a few messages that have equal to or more than 24 of D and L.

Of the 542 occurrences where there are minimum 24 plaintext in a message, there are only 3 where 1 or fewer of those plaintext land on primes.

Upper left is message #27, where there are 24 of the letter O, and only 1 lands on a prime position.
Upper right is message #84, where there are 32 of the letter A, and only 1 lands on a prime.
Lower left is message #93, where there are 25 of the letter H, and 0 lands on a prime.

plaintext.100.primephobia.1.png

So that is 3% of the messages, or 3 / 542 = 0.6% of the occurrences where there a minimum of 24 plaintext. So it does happen, but not frequently. Why Zodiac would diffuse with 63 symbols and then make one of them a high count 1:1 substitute like the + symbol I don't know. But even if he did, the chances of having only one land on a prime position seems statistically significant. And since he did diffuse with 63 symbols, it seems even more statistically significant.

I can change the variables, and you might find it interesting. Will show more thorough statistics soon.

EDIT: So I may be comparing apples to oranges instead of apples to apples. I don't know, but this may give some more perspective.

I made a table and changed the variables.

Blue box upper left: There are 705 occurrences in Jarlve's 100 message plaintext library where there are 21 or more of the same plaintext in the same message. Of those, there are 2 where those plaintext land on 0 prime positions.

Blue box lower left: Same 705 occurrences with 21 or more of the same plaintext in the same message. Of those, there are 162 where the plaintext land on 3 or fewer prime positions. So you can see, if I change the variable for prime positions from 0 to 3, the count of occurrences changes dramatically from 2 to 162.

Red box: Again, in Jarlve's library, there are 542 occurrences where the same plaintext count is 24 or more in the same message. And of those, there are 3 where there are 1 or fewer of those plaintext that land on prime positions.

plaintext.100.primephobia.2.png

3% of random shuffles of the 340 results shows the + landing on only one prime position. And 3% of Jarlve's messages show similar statistics. I am not an expert in interpreting statistics. But because Zodiac used 63 symbols to diffuse and there is only one symbol with a count of 24, this may be statistically important. I am just trying to think of a cipher model that would explain prime phobia, the period 19 bigram repeats, three symbols cycling together on exclusively odd positions, and the cycle statistics in general.

One final thought for the night. I made the 340 into a message that is 18 columns by 19 rows, which is conducive to a period 19 transposition scheme. The prime positions line up with each other in columns.

plaintext.100.primephobia.3.png

The 19 is the + symbol. Can anyone think of a simple relationship between transposition and prime phobia?