Modern Data Stack: a modern approach to data
The Modern Data Stack (MDS) is a strategy for building modern infrastructures that help companies overcome data integration, organization and management challenges.
Companies of all sizes already understand the existence of the power of data, recognize the importance of its use in business, but do not know how to overcome the challenges regarding the organization, integration and management of the information that arise.
In this sense, the Modern Data Stack (MDS), or modern data approach, emerges. A concept that has arrived to revolutionize and modernize companies’ data infrastructure.
Organizations that want to grow and remain competitive need to invest in a robust data infrastructure, capable of managing large volumes of information. This can be done with the Modern Data Stack.
In this post, you will read a clear and summarized explanation of this approach we use here at Indicium .
What the Modern Data Stack is
The Modern Data Stack (MDS) is a strategy for building modern infrastructures that help companies overcome data integration, organization and management challenges.
The MDS is the new combination of best practices and tools for creating data infrastructures.
One of its most striking features is the combination of several open-source tools to respond to the demands of a complex data infrastructure in a highly efficient way.
What does this mean in practice?
With a Modern Data Stack, it is possible to combine tools that perform different functions, such as integrating, storing or visualizing data, to create a modern, changeable and more independent data structure .
For example, consider a company that has drastically increased its customer base and needs to expand its data storage solution.
If it uses the MDS, it will have two options:
- adapt its current solution to the new demands.
- replace it with another tool that meets its needs, without having to completely redesign its data infrastructure.
In other words, with the Modern Data Stack, organizations have more flexibility to make specific adjustments and reinvent their structure without having to completely transform it. The result?
Fewer costs, more scalability and autonomy.
Today, thanks to new technologies and accessible tools, it is much easier to adopt the MDS.
However, to be successful in implementing these practices, you need to understand how all the pieces fit together.
Building a Modern Data Stack
An efficient data structure combines several services into a data stack.
Overall, a data stack has three fundamental functions :
- collecting and integrating data into a data warehouse (a “home” for the data).
- clean them and transform them into information.
- add value to decision-making through intuitive visualizations such as BI dashboards .
All of these functions are processes in a data pipeline (a flow through which data enter, are processed and transformed).
The tools used for each of these processes form the data stack. And, although the architecture of a pipeline varies according to each company, all data pipelines have these processes incorporated.
To further clarify the MDS for you, below we present, according to their respective processes, the main tools available on the market that are successful in thousands of data projects of all sizes in Brazil and abroad.
1) Data collection and integration
Making data available from multiple isolated sources for analysis is one of the main challenges of data projects. To overcome this, you need to invest in data collection and integration .
Tools like Fivetran and Stitchdata are the leaders in cloud data integration. They allow you to move data from hundreds of sources, such as ERPs, CRMs, databases, REST APIs etc., directly to a data warehouse (cloud-based or on-premises). Furthermore, they can be combined.
Therefore, there is no need for large investments in software licenses or implementation hours.
Additionally, companies looking to collect more accurate data online and offline can also use Segment or Snowplow to get a complete view of their customers.
2) Data warehouse
Another fundamental step of the Modern Data Stack is the transformation of raw data into modeled data, which occurs within a data warehouse (DW).
By centralizing data transformations in DW, there are huge efficiency gains in the project, especially through the ELT approach, which increases flexibility in the pipeline and guarantees autonomy for business analysts to define business rules in DW, accelerating the project in months.
In a data warehouse, the two main Modern Data Stack tools used for data transformation are dbt and Dataform.
Another recent and essential innovation in this approach are cloud DWs, such as Amazon Redshift and Google Big Query, which allow you to quickly store and query huge volumes of data through their scalable architecture.
3) Business intelligence (BI)
Analytical intelligence is a priority in Modern Data Stack.
With a modern data infrastructure in place, you can use different business intelligence tools to visualize, analyze and generate insights from data.
There are several robust open-source alternatives for this, such as Metabase and SaaS platforms , such as Microsoft PowerBI , Looker , Tableau, among others.
Important: it is imperative that the modern approach is that BI does not have an end in itself, but quickly generates value for the company.
4) Machine learning
Machine learning , artificial intelligence and modeling are advanced analytics techniques applied for more complex analyzes within the data stack.
To this end, in addition to the various libraries in the R and Python languages , tools such as MLFlow and Kedro help in the execution of predictive and prescriptive models, and optimize the development process, reducing the time between data modeling and use, the Achilles heel of any advanced analytics project.
5) Deployment
Tools such as Docker and Kubernetes are widely used to deploy in conjunction with orchestrators, such as Airflow and Prefect .
The difference between these technologies is that all the “Lego pieces” talk harmoniously to each other, ensuring that data flows in harmony throughout the data structure.
Modern Data Stack for everyone
The Modern Data Stack (MDS) is the link between raw data and business intelligence, that is, it is an integrated system of applications that collects, combines, analyzes and realizes the value of data for companies.
Inserting MDS is essential for modern companies that want to succeed in the data era.
Fortunately, data stack components are much cheaper, simpler to configure and handle.
Thus, companies of all sizes can use it to gain a competitive advantage and develop analytical maturity .
Would you like to learn more about the Modern Data Stack?
We have an e-book that covers all you need to know about this subject.
Understand how to optimize data operations in your company. Access your e-book here, it's free.
Daniel Avancini
Chief Data Officer
Isabela Blasi
CBDO and co-founder at Indicium