dbt: The fast-growing data transformation platform
Table of Contents
In the world of data and analytical engineering there is a platform that has been gaining followers and generating great interest for some time: dbt (Data Build Tool).
In this post I’ll try to briefly describe what dbt is, what it is used for, what profiles use this platform, and its main advantages.
I also want to make it clear that I’m not an expert in dbt. In fact I’ve only been learning about the platform for a few weeks. My first impressions have been very good and I have set, as one of my short-medium term goals, to expand my knowledge about the platform as I find it a very interesting wayt to keep building bridges between data analysts and data engineers. And also due to it’s growing popularity across different companies and industries.
What is dbt (Data Build Tool)?#
dbt (Data Build Tool) is a platform designed to facilitate data transformation from data warehouses. It-s focused on scalability and ease of use, making possible to work with raw data and turn it into more structured and ready for analysis data in a more efficient and fast way.
Everything in dbt is based on SQL and YAML command files. This might be a stopper for people not used to write SQL, but it’s on the other hand something very useful for companies and teams that are looking for a centralized data preparation platform, highly performant and that allows also to run data quality tests while documenting the entire process, all in a single platform. The platform also gives a complete view of the lineage of the data from the original source to its final fine-tuning state for the analysts and data consumers.
What is dbt used for?#
DBT is mainly used for data transformation, testing and centralization of documentation of these processes. It connects directly to a data warehouse, so that starting from raw data, it can perform all the transformations that are common in data engineering processes: aggregations, calculations and data cleansing.
dbt approach is to move from traditional ETL (Extract, Transform, Load) processes in which data is extracted from a system, transformed, and then loaded into a data warehouse, to an ELT (Extract, Load, Transform) approach.
The idea of this ELT approach is to make it easier for businesses to use data. Thanks to the current benefits of modern data lakes and data warehouses, data doesn’t need to be transformed before reaching the data warehouse. So the ELT approach is to load raw data into the data warehouse from the source systems. Then, analytics engineers use dbt to transform (the T in ELT) the raw data through SQL commands and processes that can be reused with reference functions, reducing redundancy and complexity.
Why performing this transformation directly in the data warehouse and after extraction and loading? Because it allows more control of the entire process, and keeping the raw data at hand allows more flexibility for future use cases. By always having raw data in the data warehouse, transformations can be adapted, updated, and documented as needs change within the organization.
Who uses dbt?#
DBT is starting to be used more and more by data engineers, data analysts, data scientists and by a professional role that is becoming more popular: analytics engineers. Analytics engineers are halfway between data engineers and analysts. These professionals use dbt to perform complex data transformation tasks more efficiently, making it easier to manage data and make it available to the business with greater agility.
Main advantages of dbt#
The main advantages of dbt are:
- Simplicity: DBT is based on SQL, a language very familiar to most data professionals, making it accessible and easy to learn.
- Version control: dbt integrates with version control systems such as Git, making it easy to collaborate and monitor changes in data transformation processes.
- Scalability: The SQL and YAML-based approach allows data professionals to start with very specific use cases and grow from there as the company’s analytics also grows.
- Modularity: dbt is based on a modular development philosophy. That is, the development of data transformations into modules that can be reused. This facilitates the maintenance and quality control of all processes.
- Documentation and Testing: dbt automatically generates documentation of the data transformation models and allows to perform data-quality tests for data sources and individual columns to ensure data quality and its compliance with the companies requirements and needs.
Conclusion#
In short, dbt facilitates the centralization of data transformation processes. All through SQL commands allowing greater control, better management and faster documentation of these transformations. This streamlines the availability of data that meets business requirements for analysis and use.
It’s also growing fast due to the increasing need between data professionals and companies for greater control and centralization of data quality and transformation processes. As well as the rise of professionals specialized in these tasks sitting between data engineers and analytics: analytics engineers. It also has a very active community on slack anyone can join to learn from others and get support.