You are viewing a single comment's thread from:

RE: Learn Python Series (#14) - Mini Project - Developing a Web Crawler Part 2

in #utopian-io7 years ago

There's a third option that you could've gone for with saving the intermediate state – you could have used the pickle module and simply saved out a representation of the data structure which would hold the list of names to do, done, and all of them. That is pretty efficient, even though it doesn't really mimic the way that you're using the files as a persistent buffer here.

And this method has a strange advantage: as long as you can get the system to make use of file locking appropriately, you can actually treat those file locks as process locks and distribute the actual processing of the creation lists over multiple machines. Effectively a poor man's cluster, which is great fun and I encourage everyone to try building at least once in their lives.

Probably not what you intended, but certainly amusing.

Sort:  

I know ;-)
But because the Learn Python Series is intended as a book, I'm limiting myself to only using what was covered in the previous episodes! And pickle hasn't been explained yet! ;-)

While writing this, I initially used just one .txt file holding account_name,statuslines (like: scipio,todo). But that led of course to a nested list, and I hadn't explained the any() function yet, so I changed my code to using 3 files functioning as persistent buffers ;-)

Coin Marketplace

STEEM 0.18
TRX 0.15
JST 0.029
BTC 62070.52
ETH 2422.73
USDT 1.00
SBD 2.64