RE: Learn Python Series (#14) - Mini Project - Developing a Web Crawler Part 2
There's a third option that you could've gone for with saving the intermediate state – you could have used the pickle
module and simply saved out a representation of the data structure which would hold the list of names to do, done, and all of them. That is pretty efficient, even though it doesn't really mimic the way that you're using the files as a persistent buffer here.
And this method has a strange advantage: as long as you can get the system to make use of file locking appropriately, you can actually treat those file locks as process locks and distribute the actual processing of the creation lists over multiple machines. Effectively a poor man's cluster, which is great fun and I encourage everyone to try building at least once in their lives.
Probably not what you intended, but certainly amusing.
I know ;-)
But because the
Learn Python Series
is intended as a book, I'm limiting myself to only using what was covered in the previous episodes! Andpickle
hasn't been explained yet! ;-)While writing this, I initially used just one .txt file holding
account_name,status
lines (like:scipio,todo
). But that led of course to a nested list, and I hadn't explained theany()
function yet, so I changed my code to using 3 files functioning as persistent buffers ;-)