Python Generators: How To Efficiently Fetch Data From Databases
On-Demand Courses | Recommended
A few of my readers have contacted me asking for on-demand courses to help you BECOME a solid Data Engineer. These are 3 great resources I would recommend:
Data Engineering Nano-Degree (UDACITY)
Data Streaming With Apache Kafka & Apache Spark Nano-Degree (UDACITY)
Spark And Python For Big Data With PySpark (UDEMY)
Not a Medium member yet? Consider signing up with my referral link to gain access to everything Medium has to offer for as little as $5 a month!
Introduction
As data engineers we often face situations where we have to fetch a particularly large dataset from an operational database, perform some transformations on it and write it back to an analytical database or a cloud object storage (like a S3 bucket).
What if the dataset is too large to fit into memory, but at the same time it is not worth or feasible to use distributed computing?
In this case, we need to find a way to get the job done, without affecting the life of other colleagues in the data team, for example by…