Well I thank you for all of that work. It sounds to me like the wildcard hypothesis, at least with these symbols, may be resolved. And I am glad to some extent, and will think about it and possibly pursue it some more.
I performed an analysis to answer the question about whether the 340 wildcard suspects could have been used to mask bigrams and that is why they appear in the bigram repeat list, or whether they simply occur in bigram repeats because of their high count. It looks like it is because of high count. Look at the results below, where I compared the suite messages that have high count 1:1 substitutes. All in all, high count 1:1 substitutes account for about half of the bigram repeats, across all of the examples. Note that although the 340 has only 46 bigram repeats, C_S4_P3 has only 52 bigram repeats, and R3_S4_P3 has only 51 bigram repeats. So the 340 bigram repeat list is not necessarily extraordinarily short. It has to do with the message and the key, and maybe with a second step.
I probably should have done this first before making the Purple Haze and Tolkien messages, but that's o.k., I learned a great deal. Has anyone ever written a computer program that randomly creates thousands of messages based on different variables, including key, cycling and randomization, and tallies up statistics to compare with the 340? Maybe that could establish some parameters that will give some perspective. I don't know. Instead of hillclimbing the solution, what about hillclimbing the cipher to match 34o stats?
There are cycles in the 340, but they are fractured somehow. I will be thinking of another way that Zodiac could have used a second step, after encoding the cycles, to make the message unsolvable. Some other way of examining the pieces of cycles that remain. There are a lot of symbols. I thought that maybe Zodiac could have used a lot of low count symbols to mask bigram repeats, but that's as far as my thoughts have gone. I see other possibilities.
Thanks again.
