ETL vs ELT: Why the Difference Matters in 2025
The question that seems settled but isn’t
Ask any data engineer in 2025 “ETL or ELT?” and the majority answer will be “ELT, obviously.” The cloud data warehouse has won. dbt has won. The era of monolithic ETL engines like Informatica and SSIS is over.
Except it isn’t. Not for everyone.
Sixty percent of the mid-size companies we work with still run classic ETL pipelines. Not because they’re technophobic, but because their data has characteristics that pure ELT doesn’t solve well, or because the migration cost doesn’t justify itself at their scale. And many companies that adopted ELT “because it’s modern” are discovering their Snowflake or BigQuery bill is significantly higher than expected.
Reality, as always, is more nuanced than the dominant narrative.
The patterns, without jargon
ETL (Extract, Transform, Load) extracts data from sources, transforms it on an intermediate server (the ETL engine), and loads the transformed result into the destination. Heavy processing happens outside the data warehouse. Classic tools: Informatica PowerCenter, SSIS, Talend, Pentaho.
ELT (Extract, Load, Transform) extracts data from sources, loads it raw into the data warehouse, and runs transformations inside the warehouse using its compute capacity. Modern ecosystem tools: Fivetran or Airbyte for extract-load, dbt for SQL transformations.
The difference seems minor (just two letters swapped) but has profound implications for cost, architecture, required skills, and operations.
What changed: the cloud data warehouse
The ELT pattern was unviable 10 years ago because on-premise data warehouses (Oracle, Teradata) had limited and expensive compute capacity. Loading raw data and transforming within the warehouse would have saturated the system.
Three products changed the equation:
BigQuery (Google, 2012) introduced storage-compute separation and pay-per-query pricing. Storing petabytes of data costs pennies; processing costs based on data scanned.
Snowflake (2014) pushed storage-compute separation further with elastic warehouses that scale independently. Need more transformation power? Scale the warehouse up during execution, scale it down after.
Redshift (Amazon, 2013) democratized the cloud data warehouse, though its fixed-cluster model resembles on-premise more than BigQuery or Snowflake.
With elastic, cheap compute, running heavy transformations inside the warehouse stopped being a technical problem. What previously required a dedicated ETL server now runs as SQL in Snowflake. And SQL is a language far more people know than Java or Python, which democratized access to data engineering.
ELT: the real advantages
Architectural simplicity. Instead of three systems (source, ETL engine, warehouse), you have two: source and warehouse. The transformation engine is the warehouse itself. Fewer systems, fewer failure points, less infrastructure to manage.
Iteration speed. Changing a transformation in dbt means editing a SQL query, running dbt run, and seeing results in minutes. Changing a transformation in Informatica requires opening a proprietary development environment, modifying a visual mapping, publishing, deploying, and executing. The feedback loop difference is orders of magnitude.
Native version control. dbt models are SQL files in a Git repository. You can code-review a data transformation with the same workflow you use for application code. Classic ETL tools store their logic in binary metadata inside a proprietary server.
Integrated testing. dbt allows defining quality tests directly on models: uniqueness, not-null, referential integrity, custom business rules. Tests run as part of the pipeline. For data governance, this capability is transformative.
Automatic lineage. dbt generates a dependency graph across all models automatically. You can see which tables feed a dashboard with a single command. In classic ETL, reconstructing lineage requires external tools or manual documentation.
ETL: when it’s still the right answer
The “ELT for everything” narrative ignores real scenarios where classic ETL remains superior:
Sensitive data that shouldn’t be loaded raw. If you have data with PII (personally identifiable information) that must be anonymized or pseudonymized before entering the warehouse, you need to transform before loading. Loading data containing national IDs, account numbers, or medical histories into a warehouse staging area to transform later is a compliance risk. Anonymization must happen before the data touches the warehouse.
Sources with complex formats. Nested XML, files with mixed encoding, semi-structured data with inconsistent schemas, mainframe fixed-length files. Trying to load these raw into BigQuery or Snowflake and transform with SQL is a nightmare. An ETL engine with advanced parsing capabilities (or a Python script with pandas) solves the problem before data reaches the warehouse.
Extreme volume with simple transformations. If you process 500 million records daily but the transformation is trivial (rename fields, convert types, filter invalid records), running that transformation inside Snowflake generates significant compute cost. A Spark pipeline that transforms and loads directly can be 5-10x cheaper for that specific pattern.
Operational integrations. If transformed data goes to an operational system (a CRM, an ERP, an API) rather than a warehouse, the ELT pattern doesn’t apply. You need ETL or an integration tool (MuleSoft, Workato, n8n) that extracts, transforms, and loads into the operational destination.
Limited budget with low volume. If you have 5 data sources, a couple million records, and a 2-person team, a Python pipeline with Airflow can be more cost-effective than Fivetran + Snowflake + dbt. Modern data stack tools have minimum costs that don’t always justify themselves for small volumes.
The real cost: the conversation nobody has
The modern data stack (Fivetran + Snowflake + dbt Cloud) is elegant, productive, and potentially expensive. Typical monthly breakdown for a mid-size company:
| Component | Monthly range |
|---|---|
| Snowflake (storage + compute) | EUR 500-5,000 |
| Fivetran (extract-load) | EUR 500-3,000 |
| dbt Cloud | EUR 100-500 |
| Total | EUR 1,100-8,500 |
Compared with a Linux server running Airflow, Python, and PostgreSQL as a warehouse:
| Component | Monthly range |
|---|---|
| Server (4 CPU, 16 GB RAM) | EUR 50-100 |
| Managed PostgreSQL | EUR 50-200 |
| Total | EUR 100-300 |
The difference is an order of magnitude. Obviously the second stack has scale limitations the first doesn’t, and requires more engineering expertise to maintain. But for a company processing 10 million records daily (which is a lot for most SMEs), PostgreSQL with partitioning and good indexes can be sufficient.
The common mistake is adopting the modern data stack “because that’s what scale-ups use” without calculating whether the volume and complexity justify it. A 50-person company doesn’t need Spotify’s data infrastructure.
The migration path: ETL to ELT
If you decide to migrate from classic ETL to ELT, these are the steps we’ve validated in real projects:
Phase 1: Inventory and prioritization. Document all existing ETL pipelines: sources, transformations, destinations, frequency, SLAs. Prioritize migration by impact (pipelines feeding critical business decisions first) and complexity (simplest first to build momentum).
Phase 2: Parallel extract-load. Configure Fivetran, Airbyte, or a custom extractor to load sources raw into the new warehouse. Run in parallel with the existing ETL for 2-4 weeks to validate that raw data is correct and complete.
Phase 3: Rewrite transformations in dbt. Convert ETL mappings into dbt models (SQL). This isn’t an automatic translation; it’s a rewrite. Take the opportunity to clean up logic that accumulated over years: CTEs instead of chained temporary tables, consistent naming, inline documentation.
Phase 4: Cross-validation. Run the new pipeline and the old one in parallel. Compare results. Differences reveal bugs (sometimes in the new pipeline, sometimes in the old one that nobody had detected).
Phase 5: Cutover. When results match for 2 consecutive weeks, shut down the old ETL. Keep access to the old server for 30 days just in case.
Typical timeline: 2-6 months for a 20-50 pipeline migration, depending on transformation complexity and team availability.
The hybrid pattern: the pragmatic answer
Most organizations we advise end up with a hybrid pattern:
- ELT for structured data from SaaS applications (CRM, ERP, marketing tools) loaded into the warehouse and transformed with dbt.
- ETL for sensitive data requiring pre-load anonymization, complex formats needing parsing, and operational integrations that don’t go through the warehouse.
It’s not purist, but it works. And in data engineering, “works” is more valuable than “elegant.”
Five questions before choosing
Before deciding between ETL and ELT (or hybrid), answer these five questions:
- What daily data volume do you process? Below 10 million records, any pattern works. Above 100 million, warehouse compute cost matters.
- Does your data contain PII that must be anonymized? If yes, you need pre-load transformation for that data.
- Does your team know SQL? If most of the team is strong in SQL, ELT with dbt is the natural path. If the team is stronger in Python/Java, ETL with Spark or Airflow may be more productive.
- What’s your monthly budget for data infrastructure? If it’s below EUR 500/month, the modern data stack probably isn’t justified.
- Does transformed data go to a warehouse or an operational system? If the destination is an operational system, ELT doesn’t apply.
If your destination is a data warehouse or a data lake, the loading pattern directly influences the architecture. The answers define the pattern. Not the trend, not the Snowflake blog, not what your neighbor uses. Your data, your team, your budget.
About the author
abemon engineering
Engineering team
Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.
