Page 17 of 19

Re: CIPHER STRUCTURE

PostPosted: Wed Jul 29, 2015 4:04 am
by Jarlve
I love these results!

They do somewhat confirm that the 340 has no columnar transposition going on after encoding, since the Mystery cipher is just that and that seems to juggle the cycles around awkwardly.

It's true what you say smokie, that cycles are interrupted starting at row 11. But if the 340 is a 2 part cipher and the encoding started anew at row 11, we probably would see halves that are about equal. For now it seems that the 340 correlates best to the cipher with some randomization in the cycles and a few 1:1's.

I'll create a cipher which is more prone to randomization in the cycles near the end. Give me some time.

Re: CIPHER STRUCTURE

PostPosted: Wed Jul 29, 2015 4:19 am
by smokie treats
Thanks, Jarlve. I know that you are trying to do a lot of things right now, so whenever you get the chance is fine.

EDIT:

Jarlve wrote:Just want to note that smokie's latest work on the ciphers in viewtopic.php?f=81&t=267&start=160 (his last post) also seems to add evidence against columnar transposition being actual after encoding for the 340. It seems to juggle the cycles around, though it's certainly not conclusive until someones decides to do a really big test to see what actually happens to the cycles with columnar transposition after encoding.


Jarlve, if you want me to, I can produce these simple two symbol cycle stats for different test messages, and line them up side by side with the 340. I can include whole, first half, and second half stats. You can make changes to the messages and I will show what happens to the cycle stats. You don't have to tell me what you did beforehand, and I won't try to solve them.

Maybe that will help to clear things up about whether Zodiac rearranged the ciphertext after encoding or any other questions that you may have. And maybe someone who finds interesting results can later take the baton and perform more sophisticated testing.

When and if you want to.

I think that the cycles are a gift. Maybe we will never be able to sort all of the Zodiac made cycles from the random cycles or the cycle overlap made cycles (I think there may be a difference between the two). But maybe studying the cycle stats will help us figure out what the 340 really is.

I also am thinking about a new idea for scoring two symbol cycles. Just a rough draft of an idea. But with my spreadsheet I think that I can easily identify the start and end positions of each cycle. There may be a lot of different ways to use that information. For instance with cycle type ABABAB, distance and mean distance between start and end position, mean start position, mean end position, first half only, second half only, other? Can there be some meaningful use for the information? Is a cycle ABABAB that starts in row 1 and ends in row 10 any more or less likely to be Zodiac-made than a cycle that starts in row 1 and ends in row 20? Can test messages be used to figure that out? Other questions?

Like I said, a rough draft of an idea, and I'll have to work on that later. I may be setting myself up for a lot of fun work when I have other obligations. :shock:

Re: CIPHER STRUCTURE

PostPosted: Thu Jul 30, 2015 5:38 am
by Jarlve
Yes, the cycles are very much a gift. I think your distances idea could be very fruitful, it's something I was planning to do myself eventually. On average a true cycle should be more equally spread throughout the cipher. I have such a measurement for individual symbols and it really does show that in a cyclic cipher the symbols are more equally spread throughout the cipher.

Obligations come first, take your time, no strings attached!

Re: CIPHER STRUCTURE

PostPosted: Thu Jul 30, 2015 1:34 pm
by smokie treats
Yes, I see what you mean.

A.....B.....A.....B.....A.....B is more likely a true cycle compared to A..B.......A..................B..A....................B.

That makes sense because of plaintext frequencies. I will definitely have to explore that concept, perhaps with my hillclimber. I could just make a new scoring formula and try it out on different messages. See if the hillclimber flushes out more or fewer true cycles.

Thanks for the input!

Re: CIPHER STRUCTURE

PostPosted: Thu Jul 30, 2015 1:55 pm
by daikon
smokie treats wrote:A.....B.....A.....B.....A.....B is more likely a true cycle compared to A..B.......A..................B..A....................B.
That makes sense because of plaintext frequencies.


I don't quite follow everything you guys are doing here, so apologies if I misunderstood what you meant, but I don't think it's quite true what you just said. It would be, if the letters were evenly distributed throughout the plaintext, but they are usually not, for an average English text.
For example, take your last sentence. I'll highlight all E's: "thatmakEssEnsEbEcausEofplaintExtfrEquEnciEs", or ".......E..E..E.E....E........E....E..E...E.". Very uneven distribution.
It is even more so for the letter S: "........SS..S......S......................S.."

Re: CIPHER STRUCTURE

PostPosted: Thu Jul 30, 2015 7:58 pm
by smokie treats
Yeah, your are right.

Daikon, if you can think of a way to score cycles so that we can separate the true cycles from the false cycles, it would be greatly appreciated.

So far I have used a simple formula to calculate the percentage of ciphertext that is bracketed by the ciphertext that it should be bracketed by. For example, ABABAB = 4/6 = .60. That actually works pretty well to help flush out the symbols that are not in any strong cycles, like the q and the +. I just add up the total score for each symbol, and compare with the total score for all of the other symbols. For example, for symbol 1, score of 1 with 2, 1 with 3, 1 with 4, etc. all added up. The symbols with the lowest total scores don't cycle well with other symbols.

I use probability scores. For example, ABABAB = 2^6 = 64, whereas ABAB only scores 2^4 = 16. My cycle "hillclimber" can flush out maybe a dozen or so true two-symbol cycles. I think that is pretty good, considering all of the false two symbol cycles.

It's sort of like panning for gold. You have to get the light material to float to the top and out of the pan so that there are only a few nuggets at the bottom. I can only work with two symbol cycles in any practical way. But that's o.k. because ABCD includes AB AC AD BC and CD and sometimes I can find all of them together in practice messages.

If you can think of any other way to score the cycles, to study the cycles to determine what Zodiac did besides cyclic homophonic substitution, or to use the positions of the cycle symbols in the message in any useful way, let me know. Thank you in advance for any participation.

Smokie

Re: CIPHER STRUCTURE

PostPosted: Thu Jul 30, 2015 9:39 pm
by doranchak
I think cosine similarity is useful for corroborating cycle candidates. The measurement compares symbols to each other and tries to find symbols that "act" like other symbols. For example, if symbol A tends to be followed (preceded) by the same symbols that symbol B tends to be followed (preceded) by, then there's a better chance that they stand for the same plain text letter. Basically, you build vectors based on symbol counts and you measure the distances between them (smaller distances mean greater similarity).

The Copiale cipher was cracked in part by identifying homophones using cosine similarity measurements. Here's the paper that describes the technique: http://stp.lingfil.uu.se/~bea/publ/copiale-11.pdf

The 340 is probably too short for reliable cosine similarity measurements but maybe they are useful when combined with other techniques.

Re: CIPHER STRUCTURE

PostPosted: Fri Jul 31, 2015 6:59 pm
by smokie treats
Thanks for the article! It was easy to read and very interesting. I can imagine people in European high society, such as kings and queens, using ciphers to send messages to each other in the 1500's. I understand about the cosine relationships a little bit. I know what cosine is, but don't see any triangle relationships with the vectors. But, I took Calculus 30 years ago. Really a very well written article.

I have been having fun with my cycle hillclimber and different ways to score cycles. I shall post some interesting findings a bit later. I made a message with perfect cycles, and tried my cycle hillclimber with the probability and percentages formulas. Surprisingly, the percentage formula performed much better and paired up a whopping 39 symbols. My next attempt will be at including a value that rewards total distance between the first symbol and the last symbol in a cycle.

EDIT: Jarlve, are you out there? I have a new formula for my cycle hillclimber, and with a message with only perfect cycles (no high count 1:1 or whatever), I was able to gather together 43 symbols. And 9 of the 63 symbols are low count 1:1, for B, F, J, K, etc.

Score = (number of alternations/ count symbol 1 + count symbol 2) * ((high position - low position)/17)

The distance between the high position and low position is figured in total number of rows, so that a cycle that spans from row 1 to row 20 is given higher preference compared to a cycle with the same number of alternations and total count but that starts in say row 2 and ends in row 19. That's what the 17 is for.

For cycle 1 2 1 2 1 1 1 2 2 in my test message, I get 4 consecutive alternations. Total symbol count is 5 + 4 = 9. High position for the last 2 is 277 and low position for the first 1 is 8. Score = ( 4 / 9 ) * ( 277 - 8 ) / 17 = 7.033.

When I use my cycle hillclimber, only 10 cycle symbols are left standing by themselves without sitting next to a cycle partner. Not all of the cycle symbols are gathered together, so I could get AA in one place, and AA in another place. But the list of possible merges is a lot better with this formula.

Please shoot me a grid of numbers when you can, perfect cycles and that's all. No other tricks. See if I can identify a lot of the possible merges. I'll post the graphic of my hillclimber results for the Purple H experiment in a bit so you can see.

Smokie

Re: CIPHER STRUCTURE

PostPosted: Fri Jul 31, 2015 8:09 pm
by daikon
smokie treats wrote:I understand about the cosine relationships a little bit. I know what cosine is, but don't see any triangle relationships with the vectors.


It's a tricky one. Took me a bit to understand what they meant by "cosine relationship" (or "cosine similarity"). I'll try to explain. They are basically using a dot product. A dot product of two vectors is equal to the cosine of the angle between them. Well, for normalized vectors (i.e. vectors of length 1), otherwise you also need to divide by their lengths. Now it is important to remember that cosine of 0 degrees is 1, cosine of 90 degrees is 0, and cosine of 180 degrees is -1. Another way to look at it: the dot product represents one of the vectors projected onto the other. Sort of like a shadow. If the vectors are perpendicular, the projection (shadow) of either one of them onto the other will be zero. If they are pointing in the same general direction, the projection (shadow) will be the longest.

If you combine all this together, you'll see that "cosine relationship" is just a number representing whether 2 vectors are pointing in the same general direction (angle is close to 0 degrees, and cosine is 1), or if they are perpendicular (90 degrees, cosine is 0), or if they are pointing in generally opposite directions (180 degrees, cosine is -1). How to calculate the cosine similarity value for two vectors? That's where the dot product comes in. You just multiply individual coordinates of 2 vectors and add it all together. So if you have to vectors of ( 1, 2, 3, 4, 5) and ( 9, 8, 7, 6, 5 ), their dot product is: 1*9 + 2*8 + 3*7 + 4*6 + 5*5. You'll also need to divide that number by the length of each of the 2 vectors to normalize it (i.e. bring to -1..1 range). No need to calculate the actual angle, or compute cosines of any angles.

Why does that number tell you whether the vectors are pointing in the same direction, or if they are perpendicular? Magic? Not quite. :) Let's simplify to 2 dimensions. Let's make one vector point along X axis: (1,0), and the other along Y axis: (0,1). They are obviously perpendicular to each other (i.e. not at all "similar"). Their dot product is, as expected: 1*0 + 0*1 = 0. Confirmed, they are perpendicular (cosine of 90 degrees is 0). You see, it's quite simple. Since you are multiplying individual coordinates of the two vectors, if they are "not in sync" with their corresponding coordinates (i.e. one of the coordinates is close to 0 for one of the vectors, but the other one's isn't), you end up adding up 0s or very small numbers, which means vectors are mostly perpendicular, or "not similar". However, if both of the vectors have individual coordinates "in sync" (i.e. they are large numbers in both vectors), their dot product will end up close to 1, and these vectors will be "similar".

Going back to ciphers, you take all symbols and count how many times each of them follows (or leads) other symbols. You end up with "vectors" of N numbers representing each symbol (where N is the number of unique symbols in the cipher). Then you can compute dot products between each of the symbols to figure out which ones are "cosine similar" (dot product is close to 1), and those that are grouped together are likely to stand for the same plaintext letter, as they "behave" similarly (i.e. they are likely homophones). The idea is that different letters "like" to follow (or lead) certain other letters (bigram frequencies are not smooth), so it's like a fingerprint for each letter. And you can group symbols together by their "cosine similarity" fingerprint, as their fingerprints (given a long enough text) will be similar to each other. I haven't done this test on Z340 myself, but I suspect it might be just too short to get reliable similarity numbers.

You can probably extend this to bigrams, to get an even more accurate result (trigram frequencies are even less smooth). I.e. count how many times a given symbol follows (or leads) a given bigram (a pair of symbols). But you'll need to have an even longer text for that, so Z340 is definitely out.

Re: CIPHER STRUCTURE

PostPosted: Fri Jul 31, 2015 8:19 pm
by daikon
smokie treats wrote:If you can think of any other way to score the cycles, to study the cycles to determine what Zodiac did besides cyclic homophonic substitution, or to use the positions of the cycle symbols in the message in any useful way, let me know.


I think it's a great idea to try to identify cycles and therefore possible candidates for homophones, as it would help reduce the multiplicity of the cipher. In fact, that's how my "unsolvable" cipher was eventually cracked — by manual analysis of homophone cycles (which I didn't try to hide at all) and following different possibilities of merging them. But I haven't done any research in that direction myself, so I can't really offer any insight, I'm afraid. You definitely seem to have a much better grasp of the cycles.