dbt: a tool for ETL and ELT

15
min
Created in:
Jan 27, 2022
Updated:
8/12/2024

dbt is a data build tool used in ETL and ELT processes by analytics professionals in the operation of information through the SQL query language.

Without dbt, a company's data arrives "raw" to it, in various formats, whether structured or unstructured. Not to mention, the sources of this data are also varied, as are the methods of ingestion.

That's why, before it's stored in a data warehouse (DW) or data lake (DL), this data needs to be transformed to be truly useful and reliable. That's where dbt comes in. 😉

And with this article, you will know a little more about this tool. We'll tell you about the journey of Loft's data team to implementing dbt in their modern approach to data.

Happy reading!

What is dbt?

The dbt (data build tool) is a data transformation tool used in ETL and ELT processes, in which analytics professionals operate information through the SQL query language.

It is one of the main tools used during the data transformation stage to assist in the construction of a DW and, therefore, it is so important in information processing processes.

Advantages of dbt

The two main advantages of dbt are:

  1. the fact that it is open source, making it easy to implement and use generally;
  2. and its community, which is very active, present and always willing to assist in solving problems.

And now, let's get to the practice.

We want to show you how you can reinvent your processes with modern data technologies, just like Loft did using dbt.

Why did Loft choose dbt?

Loft is one of the largest companies in the real estate market, especially for bringing innovative e-commerce ideas to the real estate industry. This movement of taking the purchase and sale of real estate to online commerce was one of the great challenges overcome by the company with dbt.

To make this happen, Loft relied on the following data teams:

  • Business Analytics — Responsible for defining data concepts and making information available to the rest of the company in the form of tables and dashboards, which are intuitive and interactive dashboards;
  • Data Engineering — a team focused on creating the data infrastructure and its functionalities so that operations teams can interact with it at each stage of the project flow, i.e., with the data pipeline;
  • Data Science — a team separated from the others, allocated to specific groups to solve problems such as: how to price real estate, how to allocate brokers for real estate visits, etc.

In the beginning, however, the technologies used were not providing the answers to the demands of the company, which was growing faster and faster. Until then, Loft used Python, orchestrated data with CronJob and managed databases with MySQL, using countless spreadsheets and very little Github. 😅

These tools were not performing their functions, which implied failures that hindered the ETL and ELT processes, as well as other annoyances, such as:

  • business logic hidden in poorly executed Python codes;
  • lack of collaboration due to the low use of Github;
  • script failures hindering BI (business intelligence), reducing trust, and requiring manual interventions;
  • and lack of evolution in processes.

But one of the biggest problems was the use of Pandas (Python), which did not offer testable code, creating confusion for other contributors to understand the projects.

How did Loft solve this?

With dbt's good software engineering practices.

In addition to the reported issues, other issues were important in the company's decision to migrate to dbt. See:

  • Paradox in governance — some data sources had many owners and others had none;
  • notebooks — the equipment did not offer versioning control or approval and review processes;
  • Changes to the data—broke the unknown dependencies.

But it wasn't all bad! There were also positives that the company wanted to keep, such as the allocation of all data in a single location and delta tables, which was possible with dbt.

dbt implementation process at Loft

Once the main issues were identified, the Loft team began their journey in search of the ideal tool.

It was then that, through the book The Analytics Setup Guidebook, by Holistcs, in the section on the foundations of creating a modern data approach(MDS), the team found dbt, the ETL and ELT tool they were missing.

After that, the team sought to understand how the tool works through the hands-on tutorial made for new users, dbt Getting Started.

Then, it was time for testing: finding an ideal problem that bothered a lot of people without many exceptions, the Loft team began a process of trial and error.

At this stage, the dbt community was essential, as it was in spaces such as Discourse, an open-source forum, and in Slack itself, that Loft's technical team found numerous solutions and resolutions to doubts about the dbt tool.

The implementation itself was executed by the business analytics team using limited knowledge of Python and software engineering practices. It was necessary to offer dbt and Git training (the latter for the practice of collaboration in projects).

The entire process of migrating technologies in the company was focused on consistency, transferring only one model per week, and not all at the same time.

Once dbt was implemented, Loft found itself with the need to make new hires and onboarding processes, especially in the technical and leadership parts, to adapt the processes to the use and best use of the tool.

Among the results that emerged after the implementation, see below what the company highlights.

Execution

In a period of just 3 months, about 260 models were migrated and refactored. After 70% of the migration was done, dbt became the official tool of Loft's ETL and ELT processes.

Performance

With dbt, the full execution of all tables at the analytical level was done in just 50 minutes, delivering faster and more powerful performance compared to multiple parallel executions on ephemeral clusters.

Mentality

With better performance in data transformation processes, Loft started to adopt a different mindset for its workflow. Through the tests, the expectations of the data became clearer, as well as the business concepts.

Learning

Using dbt, the company saw the importance of changing its approach to data development, especially given testing that was previously not part of Loft's culture.

dbt in Loft's day-to-day life

With dbt already part of the routine, Loft found three main uses for the tool within the company:

  1. versioning control;
  2. testing and documentation;
  3. and development with modular SQL.

Below, you will learn about the main benefits of these three functions.

1. Versioning control

It allows you to version the codes and definitions of the tables in SQL, which is a great advantage for Loft, because, before dbt, there was no communication to warn about table definition changes because the company did not have a history. With versioning, you can now also test changes throughout the entire data pipeline (stream).

2. Testing and documentation

Ensures that only data that conforms to the defined requirements and relationships is uploaded to production. These more complex conditions that can be defined in terms of data transformation make the information even more reliable.

3. Development with modular SQL

The main difference for those who use SQL in dbt is that they don't need to use the SELECT command plus "the table name". Instead, you use REF, creating a dependency graph of the models and tables. When you need to change a table, you can already know what it consumes and what it depends on.

How dbt fits within the modern data stack (MDS)

The modern data stack (MDS) is a combination of people, processes, and tools, in other words, it is an approach that is not bought, but built.

Within MAS, the data team works with ETL/ELT with a focus on the transformation stage using dbt.

In a data pipeline like Loft's, dbt appears as a vital tool.

The tools used in the modern data stack require certain standards, such as versioning and governance, which are offered by dbt and which make the software fit so well into this approach.

Learn how to work with dbt at Indicium Academy

The first data education program to offer Analytics Engineering Training in Brazil, the Indicium Academy, teaches in a 7-week course, with live, recorded classes and material entirely in Portuguese, which is done in the role of analytics engineer, and this includes working with dbt.

Within the training, you learn how to perform the main responsibilities of this new role, including the operation of tools, such as dbt, and your main query language, SQL.

Connect with Indicium

We also share our knowledge in free materials available on our website in the Contents menu, on the YouTube channel and on our blog.

Sign up to receive our news and stay on top of the new Indicium Academy classes and everything about the world of data science for enterprises, including dbt! 😉

Tags:
dbt
Guides
Data lake
ETL/ELT
All
Tools
Partnerships

Bianca Santos

Redatora

Keep up to date with what's happening at Indicium by following our networks:

Prepare your organization for decades of data-driven innovation.

Connect with us to learn how we can help.