You are viewing a single comment's thread from:
RE: Tutorial: Exploring raster and vector geographic data with rasterio and geopandas
Nice, thank you for your time, looking forward on that .
I see, seems it will take ~800MB RAM. Be careful not to accidentally re-run the notebook (.ipynb
) more than 7 times (800MB x 8). If you want to re-run it make sure to exit the notebook first to free the memory.
Actually long ago, I do a project about image processing task and after reading my notes, I do something like lazy evaluation which only loads and computes on specific parts when it's needed. There is some library that I want to use back then but it only one of these libs that I actually use (time constraint project 😂 ). Maybe you want to experiment with one of these libraries (if you hit memories problem or have some cluster computers):
- Blaze: I'm interested in trying this but never had a chance. You can do batching, set the chunk size, then do something like external memory computation (maybe)
- Spark/PySpark: At that time I'm not really interest using this because rigorous installation process for a single laptop and seems doesn't suit my use case.
- Dask: I use this. This really suitable for my use case because of some part of the algorithm can run on GPU which is I can distribute the computation between CPU and GPU evenly (but not perfect). The downside is it really take a time to build good DAG. In my notes seems like I use delayed function a lot when loading images and do pre-processing tasks.
Thanks for the great links. I didn't know a couple of these. I have not encountered the oportunity to handle data this big, but hopefully I will and these options will come handy. Looks you know quite a bit about computationally intensive analysis. Hope to see some tutorials from you about this ;)