smokie treats wrote:I understand about the cosine relationships a little bit. I know what cosine is, but don't see any triangle relationships with the vectors.
It's a tricky one. Took me a bit to understand what they meant by "cosine relationship" (or "cosine similarity"). I'll try to explain. They are basically using a dot product. A dot product of two vectors is equal to the cosine of the angle between them. Well, for normalized vectors (i.e. vectors of length 1), otherwise you also need to divide by their lengths. Now it is important to remember that cosine of 0 degrees is 1, cosine of 90 degrees is 0, and cosine of 180 degrees is -1. Another way to look at it: the dot product represents one of the vectors projected onto the other. Sort of like a shadow. If the vectors are perpendicular, the projection (shadow) of either one of them onto the other will be zero. If they are pointing in the same general direction, the projection (shadow) will be the longest.
If you combine all this together, you'll see that "cosine relationship" is just a number representing whether 2 vectors are pointing in the same general direction (angle is close to 0 degrees, and cosine is 1), or if they are perpendicular (90 degrees, cosine is 0), or if they are pointing in generally opposite directions (180 degrees, cosine is -1). How to calculate the cosine similarity value for two vectors? That's where the dot product comes in. You just multiply individual coordinates of 2 vectors and add it all together. So if you have to vectors of ( 1, 2, 3, 4, 5) and ( 9, 8, 7, 6, 5 ), their dot product is: 1*9 + 2*8 + 3*7 + 4*6 + 5*5. You'll also need to divide that number by the length of each of the 2 vectors to normalize it (i.e. bring to -1..1 range). No need to calculate the actual angle, or compute cosines of any angles.
Why does that number tell you whether the vectors are pointing in the same direction, or if they are perpendicular? Magic? Not quite. :) Let's simplify to 2 dimensions. Let's make one vector point along X axis: (1,0), and the other along Y axis: (0,1). They are obviously perpendicular to each other (i.e. not at all "similar"). Their dot product is, as expected: 1*0 + 0*1 = 0. Confirmed, they are perpendicular (cosine of 90 degrees is 0). You see, it's quite simple. Since you are multiplying individual coordinates of the two vectors, if they are "not in sync" with their corresponding coordinates (i.e. one of the coordinates is close to 0 for one of the vectors, but the other one's isn't), you end up adding up 0s or very small numbers, which means vectors are mostly perpendicular, or "not similar". However, if both of the vectors have individual coordinates "in sync" (i.e. they are large numbers in both vectors), their dot product will end up close to 1, and these vectors will be "similar".
Going back to ciphers, you take all symbols and count how many times each of them follows (or leads) other symbols. You end up with "vectors" of N numbers representing each symbol (where N is the number of unique symbols in the cipher). Then you can compute dot products between each of the symbols to figure out which ones are "cosine similar" (dot product is close to 1), and those that are grouped together are likely to stand for the same plaintext letter, as they "behave" similarly (i.e. they are likely homophones). The idea is that different letters "like" to follow (or lead) certain other letters (bigram frequencies are not smooth), so it's like a fingerprint for each letter. And you can group symbols together by their "cosine similarity" fingerprint, as their fingerprints (given a long enough text) will be similar to each other. I haven't done this test on Z340 myself, but I suspect it might be just too short to get reliable similarity numbers.
You can probably extend this to bigrams, to get an even more accurate result (trigram frequencies are even less smooth). I.e. count how many times a given symbol follows (or leads) a given bigram (a pair of symbols). But you'll need to have an even longer text for that, so Z340 is definitely out.