Spreadsheet Encoder StatisticsI briefly suspended my digraph - transposition - homophonic message making. I made 200 of these messages, and 200 transposition - homophonic messages, but realized that I may be comparing apples to oranges instead of apples to apples because I was guessing with my encoder variables to try to emulate 340 stats.
So I re-tooled my encoder so that I can encode 16 messages at a time. The encoder works like this. It makes a key, depending on letter count, on four different efficiency settings. 1 = most efficient to 4 = least efficient diffusion.
encoder.stats.1.png
And I can make the spreadsheet map a polyphone ( or more ) to 0 to any count of highest frequency plaintext.
I made 16 messages for each of the Jarlve 100 plaintext library ( I changed message 24 because it was exactly the same as message 23 ). Four different efficiency settings * four polyphone settings = 16 messages * 100 = 1600 messages. All perfect cycles.
Here are the stats.
1. Top table shows the polyphone symbol count. X axis is the key efficiency, Y axis is the number of highest frequency letters that I mapped the one polyphone to. For the bottom row, I did not use a polyphone. Then the next row up, I mapped one polyphone to the two highest frequency letters. And so on. The Zodiac 340 has the + symbol with count of 24. Yellow shaded areas show the key efficiency and polyphone mapping match up ranges, mean + standard deviation, that the 340 would fall in.
2. Second table shows the RAW IOC calculated without including the polyphone. The bottom row does not reflect a polyphone. Even though 1200 messages for the top three rows had one polyphone, the existence of the polyphone did change the stats. The 340 RAW IOC is 1684 if you do not include the + symbol. IOC for a homophonic substitution message is IOC for a homophonic substitution message I guess.
3. The third table, one cell, shows the count of period 1 bigram repeats in the plaintext, which would become period x repeats in a transposed message.
4. The fourth table shows the count of period 1 bigram repeats after diffusion. The yellow shaded cells show that it is easier, by far, to make messages with 340 period 15 / 19 stats with an efficiency setting of 3 or 4. But the 340 is at the higher end of the setting 3 range. It is much easier to match 340 stats with the setting 4, a very inefficient key.
EDITED:
encoder.2.png
5. The fifth and bottom table shows the average of the period 1 repeat probability scores. This is the one that I am interested in because the 340 period 15 / 19 score is 16.8. More efficient keys cause more diffusion, but there will be some repeats that drive up the score because there are not very many symbols mapping to lower frequency plaintext. Like this one. The idea regarding digraph - transposition - homophonic is that the digraph diffusion makes it so that the homophonic key has 26 letters to encode, not typically 22-24. The 63 symbols would be distributed more evenly, making more of these, driving up the score. The 340 period 15 / 19 repeat score of 16.8 ranks 2nd of all periods. The idea is that a digraph - transposition - homophonic will have higher ranking scores. He could have drafted the message vertically in 15 or 19 columns, encoded with a digraph cipher, then re-drafted into 17 columns and then encoded homophonic.
EDIT: The more efficient keys make messages that have higher average period 1 repeat probability scores because it causes more repeats like this one.
340.29.42.png
So any time that I want to try a new cipher, I need to do this for the cipher type first, so that I know what settings to use.
Unfortunately, there is no cell that is shaded yellow in all of the tables. But standard deviation only covers about 2/3 of the possible range. Efficiency setting 3 with one polyphone mapping to 3 plaintext comes the closest, but the standard deviation range for repeats still barely reaches 340 stats.