Homophonic substitution

Re: Homophonic substitution

Postby doranchak » Wed Dec 09, 2015 1:31 pm

Jarlve wrote:
doranchak wrote:Jarlve, the highest azdecrypt score I found in my recent results was 20900. Do you think that is a significant bump from the average for 340-character cipher texts? I think the original 340 scores around 20300.

Hard to say, my advice is to conduct some kind of experiment that tries to answer that question. Yes, 20351.


OK - I did an experiment along those lines. The cipher that scored 20900 has 48 bigrams and 6 trigrams. So I generated 100 random shuffles with the same number of bigrams and trigrams, then ran them through azdecrypt with the same setup (30 restarts and 1000000 iterations). Result:

min: 19990
max: 20815
average: 20369

On the one hand, it makes me think 20900 is significant. On the other hand, if the 1000+ transpositions I tested are all behaving as random text, 20900 is expected to appear as an outlier.
User avatar
doranchak
 
Posts: 1583
Joined: Thu Mar 28, 2013 5:26 am

Re: Homophonic substitution

Postby Mr lowe » Wed Dec 09, 2015 8:46 pm

just an idea on partial solves.. as you all have many partial solves built up over time is it possible to put them all together in a file to see if any words or sentences align with each other. Maybe those results could help find the path.
A killer we sort stood there among the dead
Mr lowe
 
Posts: 799
Joined: Fri Aug 15, 2014 4:07 am

Re: Homophonic substitution

Postby doranchak » Thu Dec 10, 2015 6:14 am

@Mr lowe: I think that's a good idea. We would need a good way to automatically identify high-quality sentence fragments within streams of non-delimited plaintexts , even if there are a few decoding errors / misspellings. Those sorts of tasks might be better suited to algorithms that aren't strictly based on ngram counts, and capture more abstract features of the English language.
User avatar
doranchak
 
Posts: 1583
Joined: Thu Mar 28, 2013 5:26 am

Re: Homophonic substitution

Postby Jarlve » Thu Dec 10, 2015 1:35 pm

doranchak wrote:Eventually I would like to understand how you compute the measurement so I can implement it too.

I wrote a speed orientated function (FreeBASIC) for you which outputs the score. I then also calculate the percentual difference between the ciphers score and 1000 randomizations of it, which gives a number close to 180 for the 340. I haven't taken a second look at the calculation logic yet and it will probably have it's own weaknesses. But it is something different you may want to give a try.

Raw scores with weight 5:
340: 2152
408 (capped 340): 2861

Function use for 340 cipher for example: m_2s_cycles(cipher(),340,63,5). Higher weight increases the score emphasis on cycle quality, I usually go with 5 here.

Code: Select all
function m_2s_cycles(cipher() as short,byval total_symbols as short,byval unique_symbols as short,byval weight as single)as double
   dim as integer i,j,e,u,cs1,cs2
   dim as short cycle(unique_symbols,unique_symbols,1 to 100)
   dim as short cycle_length(unique_symbols,unique_symbols)
   dim as short ident(1000),ident_count(1000)
   dim as short alternations
   dim as double score,alt_per_cycle_length
   for i=1 to total_symbols
      if ident(cipher(i))=0 then
         u+=1
         ident(cipher(i))=u
         e=u
      else
         e=ident(cipher(i))
      end if
      ident_count(cipher(i))+=1
      for j=1 to unique_symbols
         cycle_length(e,j)+=1
         cycle_length(j,e)+=1
         cycle(e,j,cycle_length(e,j))=e
         cycle(j,e,cycle_length(j,e))=e
      next j   
   next i
   for cs1=1 to unique_symbols
      for cs2=cs1+1 to unique_symbols
         for i=1 to cycle_length(cs1,cs2)-1
            if cycle(cs1,cs2,i)<>cycle(cs1,cs2,i+1) then alternations+=1
         next i
         if alternations>0 then score+=(cycle_length(cs1,cs2)-1)*((alternations/(cycle_length(cs1,cs2)-1))^weight)
         alternations=0
      next cs2
   next cs1
   return score
end function
User avatar
Jarlve
 
Posts: 1324
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: Homophonic substitution

Postby Jarlve » Thu Dec 10, 2015 1:57 pm

doranchak wrote:On the one hand, it makes me think 20900 is significant. On the other hand, if the 1000+ transpositions I tested are all behaving as random text, 20900 is expected to appear as an outlier.

Given your results I don't think 20900 is significant. I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn't show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don't know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.
User avatar
Jarlve
 
Posts: 1324
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: Homophonic substitution

Postby smokie treats » Thu Dec 10, 2015 7:43 pm

Jarlve wrote:
doranchak wrote:On the one hand, it makes me think 20900 is significant. On the other hand, if the 1000+ transpositions I tested are all behaving as random text, 20900 is expected to appear as an outlier.

Given your results I don't think 20900 is significant. I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn't show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don't know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.


I will provide some basic discussion about whether the 340 period 19 statistics are a fluke soon.
User avatar
smokie treats
 
Posts: 1034
Joined: Thu Feb 19, 2015 1:34 pm
Location: Lawrence, Kansas

Re: Homophonic substitution

Postby doranchak » Fri Dec 11, 2015 6:15 am

Jarlve wrote:I wrote a speed orientated function (FreeBASIC) for you which outputs the score.


Excellent; thanks for sharing it! I will work on an implementation of it.
User avatar
doranchak
 
Posts: 1583
Joined: Thu Mar 28, 2013 5:26 am

Re: Homophonic substitution

Postby doranchak » Fri Dec 11, 2015 6:26 am

Jarlve wrote:Given your results I don't think 20900 is significant.

I'm inclined to agree.
Jarlve wrote: I did a test some days ago comparing the 340 versus a cipher where smokie had transposed a cipher on 55 different "period 19" lines and the 340 didn't show promising results.

I have been thinking and come down to these (no specific order). The bigram period 19 thing:

1. is a fluke.
2. relates to transposition and the cipher is not in English.
3. relates to a special encoding process (something like _pi mentioned).
4. relates to transposition but with some misalignments, or unexpected distribution.
5. relates to transposition and the cipher also is also somewhat polyalphabetic (wildcards?).

Or combinations of the above even. I personally think option 1 is the most unlikely. From there I don't know. But I think we should try to find out if misalignment can be the culprit for low scoring returns.


I will keep working on the transposition explorer for a while. My hope is to locate candidates that have multiple measurements that seem to peak together, or to happen upon a candidate that scores highly in azdecrypt. But I fear I may be forced to return to the "cipher generator" approach which systematically excludes specific encoding methods. An extremely tedious and time consuming approach.

One more thing I'm wondering is if it is possible to fully automate azdecrypt. For example, from my transposition explorer, I can produce a list of 100 candidates and feed them into azdecrypt's input directory, and use the resulting scores to automatically direct my search to more promising candidates. But I would need to figure out how to tell azdecrypt to start its tasks automatically and exit when it's done. Is this feasible?
User avatar
doranchak
 
Posts: 1583
Joined: Thu Mar 28, 2013 5:26 am

Re: Homophonic substitution

Postby doranchak » Fri Dec 11, 2015 8:56 am

Jarlve wrote:Raw scores with weight 5:
340: 2152
408 (capped 340): 2861


My implementation's results are:

340: 2144
408: 2873
first 340 of 408: 2861

Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*
User avatar
doranchak
 
Posts: 1583
Joined: Thu Mar 28, 2013 5:26 am

Re: Homophonic substitution

Postby Jarlve » Fri Dec 11, 2015 1:14 pm

smokie treats wrote:I will provide some basic discussion about whether the 340 period 19 statistics are a fluke soon.

Looking forward to it.

doranchak wrote:One more thing I'm wondering is if it is possible to fully automate azdecrypt. For example, from my transposition explorer, I can produce a list of 100 candidates and feed them into azdecrypt's input directory, and use the resulting scores to automatically direct my search to more promising candidates. But I would need to figure out how to tell azdecrypt to start its tasks automatically and exit when it's done. Is this feasible?

I have thought similar. How about this. I add a new mode where AZdecrypt periodically scans the Ciphers directory for files (wait mode). When a file is found it exits wait mode and processes it into the Results directory along with the input file and then returns to wait mode.

doranchak wrote:Looks pretty close. I wonder why my 340 is off from yours a little. Maybe rounding errors? *shrug*

Very strange. I ran my program without any optimizations and special stuff and it still gave the same number. I guess the Sherlock Holmes approach would be to consider your 340 different from mine. But it may be a FreeBASIC problem, if so I would like to report it.
User avatar
Jarlve
 
Posts: 1324
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

PreviousNext

Return to Zodiac Cipher Mailings & Discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron