Page 1 of 2

Recipes for primephobia

PostPosted: Wed Dec 09, 2015 3:55 am
by _pi
As demonstrated before, the z340 exhibits a peculiar characteristic: the 2 most frequent symbols ('+' and 'B') fall almost exclusively on non-prime positions. Of all 36 instances, only 2 fall on a prime position. This is statistically odd, as measured by Doranchak:

Shuffle experiments show that + will fall only on 0 or 1 prime positions in 3% of shuffles. In 0.7% of shuffles, + and B each fall on 0 or 1 prime positions. So, I can't easily dismiss the phenomenon as coincidence. More info here: http://www.zodiackillerciphers.com/?p=319


While this primephobia of the most frequent symbols could be a coincidence, it could also be a symptom of the cipher's construction methodology.

The purpose of this post is to present 2 encryption methods that substantially augment the odds of inducing such primephobia in the resulting ciphers.

Primes in columns
When listing a series of numbers in a table format, an interesting phenomenon can be observed. For example, let's list all numbers from 1 to 340 in a table of 6 columns. In this table, I have highlighted all the prime numbers in orange:

prime1.png

You'll notice that, if we exclude the very first line of numbers, all the prime numbers are positioned in columns 1 and 5. Columns 2, 3, 4 and 6, highlighted in green, are prime-safe (again, excluding the first line), meaning that no prime number can be found in these columns. This is because all numbers in columns 2, 4 and 6 are at least divisible by 2 and numbers in column 3 are at least divisible by 3.

The appearance of these "prime-safe" columns is entirely dependant on the number of columns chosen to display the list of numbers. As a second example, here is the same list of numbers organised in a 7-column table:

prime2.png

You'll notice that, excluding the first line, only the 7th column is prime-safe, since all the numbers in that column are at least divisible by 7. All other columns potentially can host a prime number.

Here are a few examples of prime-safe columns according to the number of columns used to display the list of numbers :

Code: Select all
# Columns    Prime-safe Columns
---------------------------------------
5            5
6            2, 3, 4, 6
7            7
8            2, 4, 6, 8
9            3, 6, 9
10           2, 4, 5, 6, 8, 10
...
17           17
...


If a cipher construction method were to intrinsically exploit this prime-safe columns phenomenon, it would increase the probabilities of yielding primephobic ciphers. In other words, if the construction method was somehow funneling high-frequency symbols in prime-safe columns, it would greatly increase the yield of primephobic ciphers.

Recipe #1: Vigenère
The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Caesar ciphers based on the letters of a keyword. It is a simple form of polyalphabetic substitution.
[...]
In a Caesar cipher, each letter of the alphabet is shifted along some number of places; for example, in a Caesar cipher of shift 3, A would become D, B would become E, Y would become B and so on. The Vigenère cipher consists of several Caesar ciphers in sequence with different shift values.
[...]
The alphabet used at each point depends on a repeating keyword. - Wikipedia


This repeating keyword makes the Vigenère encoding very cyclical. If the keyword is 5 characters long, it means that there will be 5 different encoding alphabets, repeated over: 1,2,3,4,5,1,2,3,4,5,1,2,etc. Another way to look at this is, for a 5-letter keyword, displaying the plaintext in a grid of 5 columns, every letter of a column will be encoded with the same alphabet. This cyclical quality of Vigenère is therefore very compatible with the prime-safe notion explained above.

For example, given a random english plaintext of 340 characters, we would find on average about 43 letter E and 30 letter T. By formatting this plaintext in a grid of 6 columns, these letters would be randomly spread out across all columns. Now, let's say we encode this plaintext using Vigenère with the keyword "QDEEZE". When an E in the plaintext is encoded with an E in the keyword, an "I" is obtained. When a T in the plaintext is encoded with a D in the keyword, a "W" is obtained. Since the keyword is 6 characters long, and the keyword letters D and E are in positions 2, 3, 4 and 6 (all prime-safe columns), this encoding process will funnel a high amount of resulting I and W symbols in prime-safe columns.

By generating random english plaintexts of 340 characters and Vigenère encoding them with that "QDEEZE" keyword and only selecting the resulting ciphers where the number of symbols I and W total 36 (to mimic the frequency of + and B in the z340), we get a staggering 54% of ciphers which exhibit a primephobia on these symbols equal or higher than the + and B of the z340. This is in comparison with 0.7% of random shuffles of the z340 exhibiting equal or higher prime phobia than the original z340.

The size of the keyword, its letters and their positions in the keyword will have a dramatic impact on the likelyhood of producing a primephobic cipher.

Recipe #2: progressive key polyalphabetic cipher
This method consists in switching encoding alphabets for each letter of the plaintext. If 5 alphabets are defined, the 1st plaintext letter is encoded using alphabet #1, the 2nd with alphabet #2, etc. The 6th letter is encoded with alphabet #1 and so forth... The number of defined alphabets will dictate how frequently the encoder cycles through these alphabets. This is the same principle as the number of characters in a Vigenère keyword.

For example, let's consider this partial encoding table consisting of 6 alphabets where only the +, B and X symbols are mapped:

prime3.png

This means that a + symbol would be decoded to either a E, A or D, depending on where that symbol is found (i.e. which alphabet is used) and a B symbol would correspond to the letters B, C or H. With such an encoding table, the + and B symbols would only fall on columns 2, 3, 4 and 6 (all prime-safe columns), thus yielding a rate close to 100% of ciphers being more primephobic than the z340.

Again, the number of alphabets and how the symbols are assigned to plaintext letters will greatly affect the primephobic cipher yield. But, as with Vigenère, the interesting conclusion is that both these encoding schemes have a demonstrable and significant impact on primephobia by concentrating symbols in prime-safe columns.

I think it is possible that the prime phobia exhibited by the z340 is an indication that a similar cyclical approach (polyalphabetic or otherwise) was used with favorable conditions as to concentrate + and B symbols in prime-safe columns.

_pi

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 4:21 am
by glurk
_pi-

That is fantastic work, and absolutely fascinating. I have never thought about nor seen "primes in columns" like that before!
Now what to actually DO with it, I don't know... But I'm certainly interested in where this goes. :ugeek:

-glurk

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 6:00 am
by smokie treats
I agree that Zodiac could have cycled multiple keys and that could be the explanation for the prime phobia phenomenon. Thank you for showing that with such an easy to understand presentation.

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 6:24 am
by doranchak
Nice job, pi! This is a very compelling avenue to explore. Thanks for taking the time to put this writeup together. It would be interesting to simulate these possible construction methods more closely, to see how much the results resemble the actual 340.

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 12:48 pm
by Jarlve
Interesting and refreshing post _pi.

_pi wrote:If a cipher construction method were to intrinsically exploit this prime-safe columns phenomenon, it would increase the probabilities of yielding primephobic ciphers. In other words, if the construction method was somehow funneling high-frequency symbols in prime-safe columns, it would greatly increase the yield of primephobic ciphers.

Can we come up with something like that which is not specifically vigenere and also distributes over a chosen number of characters which behaves like the 340? What we believe to be homophonic substitution in the 340 may be such a process. I have made such an example algorithm like this a while ago but it doesn't seem to correlate well with the 340: viewtopic.php?f=81&t=2218&p=29105&hilit=mimic#p29105

Also I'm not 100% sure but I remember something about the "+" symbol positions being close to modulo 5 and 10. It was just an observation without a math check.

Here is a cipher which uses vigenere and homophonic substitution encoding with a 5 letter keyword.

Code: Select all
1  2  3  4  5  6  7  8  9  5  10 11 12 13 14 15 16
17 10 18 19 8  9  20 21 22 23 24 25 26 13 27 2  28
29 1  30 31 32 24 17 4  7  8  20 33 9  34 23 3  35
36 37 27 14 38 39 40 25 6  11 28 24 38 14 41 42 31
38 22 11 29 43 37 39 44 26 45 46 10 33 11 36 35 26
5  28 1  28 30 12 25 43 15 47 19 41 31 20 25 4  16
48 49 50 12 32 14 2  26 40 24 37 17 6  44 7  21 18
20 3  13 34 17 24 27 16 32 8  35 45 42 41 29 43 21
38 40 17 14 1  5  21 2  46 9  42 37 2  10 42 23 13
15 39 19 30 8  47 11 21 4  28 29 23 1  2  30 10 18
44 34 12 9  45 27 32 33 38 34 35 25 22 36 46 41 47
45 15 35 43 47 40 20 36 19 3  29 49 12 11 28 28 29
30 45 4  9  42 44 1  47 7  10 20 39 25 31 37 15 10
26 3  47 35 4  7  48 49 46 8  9  50 20 36 24 13 40
27 50 28 17 16 18 8  43 5  33 43 21 26 7  14 46 17
41 16 43 35 34 26 38 1  29 11 19 24 7  1  19 41 11
9  37 15 8  36 50 21 10 41 23 1  2  14 40 16 34 49
6  45 7  40 27 42 23 31 50 38 34 22 30 24 25 14 8
27 33 6  14 26 38 27 3  5  41 16 27 34 15 14 50 45
35 49 11 28 32 49 28 44 20 10 24 2  46 30 3  49 12

5G&7J?FD[J;<*'M=Y
C;B:D[,3UNQ1+'@GS
.5"V2QC7FD,![9N&X
P/@M#>I1?<SQ#MZ\V
#U<.O/>H+-$;!<PX+
JS5S"*1O=%:ZV,17Y
LTE*2MG+IQ/C?HF3B
,&'9CQ@Y2DX-\Z.O3
#ICM5J3G$[\/G;\N'
=>:"D%<37S.N5G";B
H9*[-@2!#9X1UP$Z%
-=XO%I,P:&.T*<SS.
"-7[\H5%F;,>1V/=;
+&%X7FLT$D[E,PQ'I
@ESCYBDOJ!O3+FM$C
ZYOX9+#5.<:QF5:Z<
[/=DPE3;ZN5GMIY9T
?-FI@\NVE#9U"Q1MD
@!?M+#@&JZY@9=ME-
XT<S2TSH,;QG$"&T*

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 1:06 pm
by doranchak
Jarlve wrote:Also I'm not 100% sure but I remember something about the "+" symbol positions being close to modulo 5 and 10. It was just an observation without a math check.


Here are the positions. Modulo 5 and 10 are in boldface:

20 40 64 65 72 81 105 128 133 140 142 159 162 172 201 211 237 238 255 276 282 290 291 340

7 out of 24 +'s fall on positions that are evenly divisible by 5 or 10.

I generated 10,000 sets of 24 positions randomly drawn from 340 positions. Here is the distribution of number of positions with factors 5 or 10:

Code: Select all
count, number of positions with factors 5 or 10
  43, 0
 241, 1
 791, 2
1483, 3
2023, 4
2038, 5
1579, 6
1009, 7
 506, 8
 203, 9
  58, 10
  20, 11
   4, 12
   2, 13


So about 18% of all shuffles had 7 or more +'s falling on positions divisible by 5 or 10.

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 1:17 pm
by ace ventura
Forget it ,a 1000 people working for a 1000 years could not crack it and if they did how would they know they have it , with out the code key ......but + is all but one vowel and frequent letters...I think

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 1:30 pm
by Jarlve
Thanks doranchak,

I meant 5 or 10. And that some are close, 64, 81, 159, 201, 211, 291. It's something that I noticed more than a year ago, when I first started working on the 340. I guess it ain't much but _pi's 5 column distribution example reminded me of that.

Modulo 5: 20, 40, 64, 65, 72, 81, 105, 128, 133, 140, 142, 159, 162, 172, 201, 211, 237, 238, 255, 276, 282, 290, 291, 340.

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 1:38 pm
by doranchak
Hmm, too many numbers would be included if we marked the "close" matches too.

[1] 2 3 [4] [5] [6] 7 8 [9] [10] [11] 12 13 [14] [15] ...etc...

60% of all integers are divisible by 5 or are only 1 away from such a number.

Re: Recipes for primephobia

PostPosted: Wed Dec 09, 2015 7:24 pm
by _pi
Thanks for the good words and feedback!

Jarlve wrote:Can we come up with something like that which is not specifically vigenere and also distributes over a chosen number of characters which behaves like the 340? What we believe to be homophonic substitution in the 340 may be such a process.


One way I found to channel certain symbols into prime-safe columns, as I described in my original post, without involving a poly-alphabetic approach, is to assign a homophonic symbol to a letter based on its position in the plaintext. This way, you again find the cyclical component required to induce this prime-safe effect.

So, in the context of a purely homophonic substitution cipher, let's say the letter E can be mapped to the following 6 symbols: !, @, #, $, % and ?. When encoding the letters E in the plaintext, instead of randomly assigning a symbol or following a traditional cycle through all the 6 symbols, you would advance in the cycle at each position in the cipher. At position 1 in the cipher, if an E is present, it would be mapped to !. For position 2, if an E is present, it would be mapped to @. Etc. Following this logic, symbols @, #, $ and ? would be highly prime phobic in the resulting cipher as they lie in prime-safe columns in this homophonic assignment cycle.

This approach was clearly not used in the z408 and probably not in the z340 but, for the sake of this conversation, it would be a homophonic mono-alphabetic encoding methodology yielding a high ratio of prime phobic ciphers.

Pushing this idea further, using such a positional way of assigning homophonic symbols could be a way to avoid confusion in the case where polyphones are involved. Let's say that the z340 is similar in construction to the z408 but it contains way more polyphones. One way to make the cipher decodable would be to select the polyphone assignment based on its position. For example, the + symbol could be translated to E in position 1, B in position 2, etc.