My work

Re: My work

Postby Jarlve » Sat Aug 15, 2015 4:50 am

When moving the AZdecrypt solving subroutine to another program I noticed that it suddenly worked much better. I have identified the problem (it made each cipher slightly polyalphabetic!) and I'm working on yet another update that will be really worth it. It'll include another option "keys per iteration/random restart" that will be beneficial for solving hard ciphers.

Maybe it's good that my solver was plagued by this problem since the start, because it really made me go all the way to improve upon it. In terms of the validity of the tests I did, they should still be valid since it was already solving higher multiplicity ciphers than the 340. But I may opt to rerun some tests starting from next year.

The jroberson 405 cipher still does not solve because the solution scores lower than the local optima, it will require higher order n-grams to solve this cipher. I have tested that the solver can now solve ciphers in the 0.3-0.4 multiplicity range, of course depending on how difficult the cipher is, which are another range of factors, often unknown.
User avatar
Jarlve
 
Posts: 2544
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: My work

Postby doranchak » Sat Aug 15, 2015 4:54 am

That's great news, Jarvle. I look forward to the update!
User avatar
doranchak
 
Posts: 2360
Joined: Thu Mar 28, 2013 5:26 am

Re: My work

Postby Jarlve » Sun Aug 16, 2015 7:11 am

Download AZdecrypt097.

- A bug was fixed that greatly enhances the general performance of the solver.
- A new setting is introduced "keys per iteration" which allows the user to increase the search depth of the hill climber. Very benefical for hard to solve, high multiplicity ciphers.

Under ideal circumstances (high scoring plaintext) the solver should be able to operate up to a multiplicity of about 0.4.
User avatar
Jarlve
 
Posts: 2544
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: My work

Postby Jarlve » Mon Aug 17, 2015 1:30 am

I'm working on 6-gram support for AZdecrypt. I have used the Usenet corpus (thanks to daikon for making me aware of it). After cleaning up the corpus it still sized 23.2gb and the total size of the 6-grams is almost 500mb! Anyone happen to know an even bigger corpus?

8-)

The 5-gram solver will be kept as it is and the 6-gram solver will be introduced as a new solving module and will probably be much slower. To use the 6-gram solver your computer will probably need 2 or 4gb of RAM to work efficiently. I can't give an ETA because I'm not sure how long the optimization process will take but it should be ready within a month or so. Also plan to improve the 5-gram solver a bit after the 6-gram release.
User avatar
Jarlve
 
Posts: 2544
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: My work

Postby glurk » Mon Aug 17, 2015 4:05 am

Jarlve-

Do you think that using 6-grams will be of benefit? I'm not saying they won't but there is a point of diminishing returns somewhere. I have not done the testing myself, but I've read up on it a lot, and many researchers believe (maybe a consensus) that 3 and 4-grams are optimal.

Longer n-grams are not going to be a panacea. At some point, it becomes less about general English language stats and more about the specific words / phrases used in the corpora.

I'm not saying it is wrong to try it, but I don't think longer and longer n-grams are going to be much help. Just my opinion.

-glurk
--------------------------------
I don't believe in monsters.
User avatar
glurk
 
Posts: 720
Joined: Mon Apr 01, 2013 6:35 am
Location: Location, Location.

Re: My work

Postby doranchak » Mon Aug 17, 2015 4:14 am

Jarlve wrote:I'm working on 6-gram support for AZdecrypt. I have used the Usenet corpus (thanks to daikon for making me aware of it). After cleaning up the corpus it still sized 23.2gb and the total size of the 6-grams is almost 500mb! Anyone happen to know an even bigger corpus?


The Google Ngrams data sets are quite large: http://storage.googleapis.com/books/ngr ... etsv2.html

Just for fun I added up the file sizes for all the 5-grams for American English (Version 20120701): 187.2 GB

I look forward to your 6-grams update!
User avatar
doranchak
 
Posts: 2360
Joined: Thu Mar 28, 2013 5:26 am

Re: My work

Postby Quicktrader » Mon Aug 17, 2015 4:36 am

The possibility to define certain symbols as e.g. L, T or E would be very, very, very cool...

Excellent work, btw.

QT
*ZODIACHRONOLOGY*
User avatar
Quicktrader
 
Posts: 2384
Joined: Mon Apr 01, 2013 11:23 am
Location: Vienna, Austria (Europe)

Re: My work

Postby Jarlve » Mon Aug 17, 2015 4:57 am

Thanks all,

glurk wrote:Do you think that using 6-grams will be of benefit?

Early testing indicates that they are of benefit, but there are indeed diminishing returns. Which was anticipated. They way I test is by removing rows from ciphers, if 6-grams mean I can remove one more row from a cipher and still get a solve were 5-grams previously failed I'll gladly take it! It's all about pushing small improvements. So I believe that n-grams that fit in memory are worth taking. I guess you are suggesting that I start looking at word level n-grams and that is certainly something I should start to consider.

doranchak wrote:Just for fun I added up the file sizes for all the 5-grams for American English (Version 20120701): 187.2 GB

I saw that to. Do you or glurk have any ideas on how to implement word level n-grams into a solver?

Quicktrader wrote:The possibility to define certain symbols as e.g. L, T or E would be very, very, very cool...

You mean locking certain symbols to letters like in ZKDecrypto? I agree. I could push an updated version of Examine which has this functionality but it is not very user friendly (at first) and I can't afford/offer (time-wise) much support.
User avatar
Jarlve
 
Posts: 2544
Joined: Sun Sep 07, 2014 9:51 am
Location: Belgium

Re: My work

Postby doranchak » Mon Aug 17, 2015 5:11 am

Jarlve wrote:
doranchak wrote:Just for fun I added up the file sizes for all the 5-grams for American English (Version 20120701): 187.2 GB

I saw that to. Do you or glurk have any ideas on how to implement word level n-grams into a solver?


Funny - I was just chatting with glurk about this. :)

I think there is some benefit to dictionary attacks with long n-gram sequences if the constraints imposed by the relevant sequence of cipher text are large enough to weed out all the spurious candidate texts. I.e., a snippet of cipher text that has enough repeating symbols will exclude many possibilities.

In an unlimited memory/speed fantasy world, I would index all those ngrams such that they could be queried by those constraints. Then it becomes possible to consider sets of constraints (groups of cipher text snippets that each have strong constraints). I would try to search for a set of small-ish cipher text snippets that maximizes the number of symbols shared between snippets. This significantly prunes the search space since plaintext under consideration for one snippet would inform the choices for other snippets.

I'm thinking of Edwin Olson's dictionary attacks for short cryptograms, but expanding the idea to handle longer n-gram sequences, and to handle substring searches: https://april.eecs.umich.edu/pdfs/olson2007crypt.pdf
User avatar
doranchak
 
Posts: 2360
Joined: Thu Mar 28, 2013 5:26 am

Re: My work

Postby glurk » Mon Aug 17, 2015 5:44 am

Jarlve-

I may be getting off-topic here, I should probably start a new thread about n-grams, but an idea I once had was negative scoring for longer n-grams that NEVER appear in any corpora. Especially with longer n-grams like 6-grams, things like "QSTVPK," which would likely never appear could be given a negative score.

This never got past the idea stage with me, since something like "ZZZZZZ" could appear, meaning sleeping, snoring, etc. And I have no idea how to determine negative scores for things that never actually occur in English.

It DID give me an idea, which I never fully developed but considered. The idea is "longest string of consonants." Basically, in English you will almost never find a string of letters longer than 7 that does not contain a vowel. There are very few words or phrases that will not have an A,E,I,O,U in a 7 letter section.

I never used it, but I still like the idea, and I think it might be a good addition to your solver.

-glurk

EDIT: In fact, in this post the only => 7 grams that have no vowels are "QSTVPKwh" and "ZZZZZZc"

EDIT #2: I don't think this could be used directly in a hillclimber / solver as part of the solving process. But it might be a good metric to weed out bad prospective solutions from good ones, The 408 plaintext fits this metric, as well as all of Zodiac's known writings, as far as I can tell.
--------------------------------
I don't believe in monsters.
User avatar
glurk
 
Posts: 720
Joined: Mon Apr 01, 2013 6:35 am
Location: Location, Location.

PreviousNext

Return to Zodiac Cipher Mailings & Discussion

Who is online

Users browsing this forum: BDHOLLAND, Chaucer, Goodkidmaadtoschi, Jarlve, Shawn, tGkTcy2W9B4p60o and 51 guests

cron