
I still remember the first time I came across dbt. It was sometime in mid-2019 and like most data consulting companies then, our work was mostly code-based ETL pipelines using lower-level cloud infrastructure as you could find in most startup tech engineering blogs. Code-based ETL before dbt required experienced data engineers. As a relatively small consultancy in a very hot data engineering market, we found it difficult to scale our team without a different approach. Also, most of our team's background was not in engineering at the time. Our CTO was adamant about proper software engineering best practices with data, drawing from his experience working with big-data pipelines in his last work. After reading some of Tristan Handy's blog posts about Analytics Engineering and dbt, it became clear to us that dbt could be this missing piece that would enable our analysts to work like engineers or to put it simply, to become analytics engineers.
In hindsight, the geniality about the early versions of dbt was not the complexity of its code or features, but rather its simplicity. Most legacy ETL tools such as Informatica or Pentaho that cater to non-engineering professionals were clunky, full of distracting features, and worst, had almost 0% coverage of any SWE best practices that are a must for modern data work. On the other hand, working with modern data platforms such as Snowflake and Databricks requires much deeper technical knowledge than any typical data analyst would have, making it a data engineer-only realm. That meant that for most companies, despite being able to build data pipeline orders of magnitudes faster than the previous technology allowed, there was a real constraint on how to scale the data org since there were so few professionals that could work on it. Worse, many data engineers dislike talking to business users or even writing SQL queries at all, so the data organization was kept inevitably far from the lines of business where the business value of data lives.
Despite dbt being still in its early stages, in a few months, we built an entirely new analytics engineering practice on top of it, made up of professionals without a software engineering background but with very good analytical skills. To accelerate that movement, we developed our analytics engineering course, open to the public, and that has since trained more than 1000 analytics engineers who work for Indicium, our customers, or in multiple other companies. To date we are among the top certified dbt partners worldwide. There is no doubt dbt is a big thing for any modern data team.
What About dbt Cloud?
Well, for many early adopters like us, dbt-core was already good enough for our work. Also, many features launched with the first versions of dbt Cloud were already developed by our platform teams or by the open-source community. Up until recently, in some ways fine to just stick with dbt-core. And don't get me wrong, a lot of those features are needed by dbt Cloud to be a good tool in itself. The problem for dbt labs was that for many companies adopting dbt, as they left Plato's cave of modern data stack ignorance, there were so many possibilities to improve their data platform best practices with dbt that most platform teams became advanced users of dbt, which IMO, was not the main user persona of dbt Cloud. But then, who is?
I believe that there are three main personas for dbt Cloud: a) companies that are born into the modern data stack and don't have/don't want to keep a large data team, b) enterprise companies that want to scale their dbt core implementation into the lines of businesses and want a tool that can let them implement data management and data governance best practices while keeping the complexity low for less technical LOB analytics teams and c), companies that are relatively late in adopting a cloud data warehouse and are just now migrating away from legacy data tech, such as Talend and Informatica. Until now, it wasn't always compelling enough for some of these personas to adopt and implement dbt Cloud. So why do I think that will change?
In my opinion, the new announcements from dbt in this year's Coalesce are all in the right direction. First, dbt is acknowledging that it has to do more than just the data transformation part if it is to be the single data tool for smaller organizations and/or other companies without a dedicated data platform team. Features like orchestration, data cataloging, or even Data ingestion are all necessary. They all currently need a set of different tools that may be hard to combine and also expensive. The vision of dbt becoming a data control plane is good and goes in tandem with the consolidation trend we at Indicium have seen in the modern data stack space in the past few years.
One dbt Strategy
Arguably, the biggest announcement of Coalesce was the One dbt strategy. First, there is real value in a hybrid approach of dbt core and Cloud, with the first being developed by platform or CoE-style teams, and the latter focused on less technical LOB teams. A first-class experience for this hybrid approach in dbt Cloud is a must-have for many of our enterprise customers. Second, while most advanced features of dbt Cloud had already been developed internally by dbt power users, this is not the case for hybrid cloud and data mesh architectures. There is no single tool or platform that can deal with this ever more common practice in the enterprise, even when using the same cloud provider (e.g Databricks + Snowflake platforms). With Iceberg becoming the de facto standard for modern data storage, there is a real opportunity for dbt to become the missing piece between those data platforms, allowing teams to develop their tools without losing governance and DataOps best practices. Finally, while there is a long-time conundrum between code-based and no-code/low-code development for data transformation, this is a must-have feature for less technically minded engineers and a very common requirement for enterprises. Having this feature inside dbt Cloud and integrated with the dbt development lifecycle is a good move by dbt.
I'm confident that dbt is the most ubiquitous tool of the modern data platform. More than just a tool, dbt allowed companies to close the gap between business and data with the rise of the Analytics Engineering role. For dbt Labs, ironically, its dbt Cloud product suffered from the qualities of its original product. While there were always companies where dbt Cloud was the best fit, a large part of the market found it hard to identify where dbt Core was lacking. With the new strategy and release announcements, dbt Cloud is solving real technical and serving business needs that dbt Core cannot serve and I can see more and more use cases where dbt Cloud provides a compelling advantage over running dbt Core.
Contact us and discover how we can leverage dbt’s potential to accelerate your company’s growth.

Daniel Avancini
Chief Data Officer