smokie treats wrote:What I was trying to do with my scoring system was reward both total length and purity for purposes of flushing out the most likely candidates for Zodiac made cycles. For instance, ABABABAB, eight symbols in count, would have the same score as ABCDABCD, a 75% because six of the eight symbols in each (not including the first and last symbol) have a symbol to both the left and right of it that are the symbol that it should be if in the cycle. The logic being that short cycles such as ABAB would only score 50% and therefore not be of much use to me, since those could easily be random. A cycle ABABAB*B would have a score of 5/8=63%, which takes into account the missing symbol and makes the cycle rank lower on my sorted list of scored cycles; I would prefer to work with long perfect cycles if possible.
However... this is an artificial way to score the cycles and may not show their relative strengths; I have no idea if ABABABAB is any more or less likely to occur at random than ABCDABCD.
We can compute that directly from a very simple example, and then build from it. Let's consider a cipher with the alphabet {A,B,C,D}. To support the appearance of the cycles ABABABAB and ABCDABCD, the minimum frequency counts for the alphabet have to be: {4,4,2,2}. So, to use all those symbols, let's start with a cipher length of 4+4+2+2 = 12.
Since our "mini cipher" has length 12, and ABABABAB is of length 8, the cycle ABABABAB can appear in 5 different spots:
ABABABABAB****
*ABABABABAB***
**ABABABABAB**
***ABABABABAB*
****ABABABABAB
Since we've used up all the A's and B's in those 5 possibilities, the remaining 4 spots are filled with C's and D's. There are 6 ways for those to appear: CCDD, CDCD, CDDC, DCCD, DCDC, and DDCC. That means that ABABABAB appears in 5*6 = 30 of all the possible cipher configurations.
By the same logic, the cycle BABABABA also appears in 30 of all possible cipher configurations. CDCDCDCD cannot be formed because we don't have enough Cs and Ds.
So, there are 2*30 = 60 possible cipher configurations that would contain ABABABAB or BABABABA.
Now let's look at ABCDABCD. It, too, can appear in 5 different spots. Then, the available remaining symbols are 2 As, 2 Bs. With the same reasoning as above, there are 6 ways for the remaining As and Bs to appear, so ABCDABCD appears in 30 of all possible cipher configurations.
But we would also notice cycles such as DCBADCBA, right? So we need to count those as well. But now we realize there are many more possibilities (24 of them):
ABCDABCD, ABDCABDC, ACBDACBD, ACDBACDB, ADBCADBC, ADCBADCB, BACDBACD, BADCBADC, BCADBCAD, BCDABCDA, BDACBDAC, BDCABDCA, CABDCABD, CADBCADB, CBADCBAD, CBDACBDA, CDABCDAB, CDBACDBA, DABCDABC, DACBDACB, DBACDBAC, DBCADBCA, DCABDCAB, DCBADCBA
So, there are 24*30 = 720 possible cipher configurations that would contain this kind of cycle pattern. Note that 720/60 = 12. There are 12 times as many patterns of the form ABCDABCD as patterns of the form ABABABAB.
That means ABABABAB is much less likely to occur by chance than ABCDABCD, provided that you are considering all the other forms of the patterns.
Does this make sense?
To verify my analysis, I wrote a quick program to generate all the unique cipher texts from the alphabet described above. Here are the outputs:
http://zodiackillerciphers.com/mini-cipher.txtThere are 207,900 unique cipher texts from that alphabet. The probability of ABABABAB or BABABABA occurring by chance is 60/207900 = 0.03%. The probability of ABCDABCD (or one of the other 23 pattern types like that) occurring by chance is 720/207900 = 0.35%.
Of course, these probabilities change as we adjust the frequencies of the symbols in the cipher alphabet. I think it would be a worthy exercise to compute the exact probabilities based on the cipher length and symbol frequencies, for any cipher text. Then it will give you an exact way to compare different cycles to each other.