There are two basic paradigms of building a data processing pipeline: Extract-Transform-Load (ETL) and Extract-Load-Transform (ELT). ETL is, still, the default way, but this approach has a lot of drawbacks and it’s becoming obvious that building an ELT pipeline is better.

First of all, there’s actually no such thing as a pure ETL pipeline. There will always have to be another Transform step after the data is loaded into the data warehouse. You’ll end up having an ETLT process or two ETL pipelines joined together.

ETL pipelines are tricky to build correctly. There are subtleties with each integration that, if done wrong, can be costly. At best, you’ll lose time and money rebuilding it. At worst, you’ll lose data and produce incorrect analyses.

ETL pipelines are even trickier to operate. You don’t want to test just the code, but also the data. You need to set up a good deployment and monitoring process. You want to log both success and error metrics. Don’t forget about alerting. Do your data engineers want to be on-call? The list goes on and on.

ETL pipelines are also inherently inflexible. They need to be rigid to give the “most correct” data possible, but this also makes them more difficult to adapt. And adapt they must, as the world around keeps changing all the time. Whether its a new API version or a new business requirement, you’ll need to incorporate this change. To do so, a data engineer and a data analyst need to work in tandem.

Which leads to another problem with ETL, this time around organizational design. Regardless how you structure your data team, a data engineer will always have less skin in the game – they’ll never directly take the blame and lose credibility for wrong data in a BI dashboard. They’ll feel less responsible, hence less interested, in doing the meticulous work necessary. Also, needing a data engineer to change a pipeline just leads to slower pace of development overall, a huge competitive disadvantage in today’s world.

All of this makes building and running an ETL process a slow, expensive, and complex undertaking. The truth is, Extract and Load steps are undifferentiated heavy lifting – they are not specific to any company yet every company needs to do them to at least have a chance of getting insights from their data. So why do it at all when there’s a better alternative in the form of ELT?

Let someone with way more experience and expertise handle the EL so you can focus on the T.

You’ll get your data sooner, faster and in a reliable fashion. You’ll save money on paying extra data engineers (my guesstimate is with ETL, the data engineer to data analyst ratio is around 1:2 whereas with ELT, it’s closer to 1:5). You’ll make your data analysts faster, independent, happy.

Standard ETL has been around for a long time, but its time has passed. With modern tools, there’s no point of not doing ELT. Ask yourself this – if you have to choose with a slow, error-prone, expensive way of achieving a goal or a fast, reliable and cheaper alternative, which one would you go for?

Share your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s