BartW wrote:Hi Smokie... Silly question time...
What is your definition of a "shuffle" you and David mention it often but i am assume the following
PT is based on Random chars which are then keyed with 63 symbol homophonic randomly is this correct?
We just randomly re-organize the 340. Or scramble it. I just re-draft the message into a 340 column x 1 row grid, generate a random number for each ciphertext, sort by the random number, and then re-draft again into a 17 x 20 grid.
I think that we use a shuffle test before anything else because it is very quick and easy to do, and we want to determine whether we should take a closer look at any particular phenomenon. The combination Vigenere homophonic message that I made for you had spikes at increments of 12. How many times would you have to shuffle that message to get such spikes at any increment? Probably a lot, so if you didn't actually know what kind of a message it was you would think that the spikes may be evidence of the cipher and need to be further investigated. Doranchak shuffled the 340 one million times and there were only 2,782 spikes ( 0.28% ) at 18 or higher, so we are taking a closer look. Personally I think that the shuffle test has only so much value. But I think that it is a tool of economy. People have tried so many different theories on the 340 and none of them have so far worked. I think that doranchak is using the shuffle test to start with because he wants to use his time as efficiently as possible.
BartW wrote:In the following
score = ln ( 1 / ( ( ( symbol count / 340 ) * ( symbol count / 340 ) ) ^ number of repeats ) )
I assume Symbol count = 63?
I know natural log etc but what is the relevance to the equation?
Natural log (squared (chance) to the power of instances)
Symbol count is the count of a particular symbol. There are 24 of the + symbol ( my symbol 19 ), and four repeats. Four positions where a + symbol occurs 78 positions away from another + symbol. So the score is 21.25. But I shuffled the 340 for a while and found that with the + symbol it is very easy to duplicate four or more repeats with the +. One possible explanation for the spike is that the spike is created by the + symbol repeats alone. Without them, you wouldn't have a spike. Maybe homophonic encoding just caused some of the + symbols to align themselves at intervals of 78 positions, and that is what you detected. I just shuffled the 340 six times and got four repeats for the + symbol at x = 10. The spike is 17, but that is easier to achieve than 18.
I am going to keep working on this for a while. I am thinking of some ideas to explain the period 78 unigram repeats and the period 19 bigram repeats as if they are both produced by the same combination cipher. I score the period 19 bigram repeats similarly, but you cannot shuffle the message and get a distribution of similar scores like you can with the period 78 unigram repeats.
Question for you: What about spikes at near but not perfect increments. Say 20, 41, 60, 81 and 100 for example. Can they still be considered clues that a message is Vigenere and has a key length of 20, even if the increments are not perfect?