Best Practices

Your complete guide to dbt permissions | Census

Madison Schott
Madison Schott September 23, 2022

Madison is an analytics engineer with a passion for data, entrepreneurship, writing, and education. Her goal is to teach in a way that everyone can understand – whether you're just starting out in your career or you've been working in engineering for 20 years.

Permissions aren’t new. They’re used in nearly every aspect of the software industry, including cloud infrastructures, application development, and data management, providing fine-grained access to actions and resources. Basically, permissions ensure that unauthorized users can’t access your sensitive data. 🙅

Luckily, data management systems and data warehouse tools like Amazon Redshift and Snowflake have authorization policies that allow database admins to grant custom privileges to users. With this feature, you can limit what users can access depending on their job responsibilities, helping you to secure your data environment. 

Image

If you’re a dbt-lover like us, you’ll be happy to hear that, like Redshift and Snowflake, dbt also offers various solutions to help you protect access to your most important data assets. If you’re new to dbt, here’s the TLDR: It allows data analysts to transform raw data within their data warehouse into refined datasets that reflect the core business logic. 

dbt is powerful because it allows users to write modular code, broken up into multiple reusable models, and provides an easy way to document, test, and snapshot your data models. The best part? All that power translates to higher-quality datasets to be used by the business. ⬆️

In this article, we’re tackling the importance of permissions within dbt and how to configure them for the open-source product (using the profiles.yml file) and dbt Cloud. 

The importance of dbt permissions

Because dbt is so intertwined in your modern data stack, it’s crucial to assign the correct permissions to the tools that it touches as well as the users modeling your data.

Data warehouse 

dbt connects directly to your data warehouse, meaning it has access to all of your raw and transformed data. Configuring permissions for what it can and cannot do is essential for maintaining the integrity of your one source of truth: Your data warehouse. ⭐

So, you need to consider which databases, schemas, and tables dbt is allowed to read from and write to. You need to give the tool enough permissions to properly transform your data while minimizing potential problems. 

For example, dbt should never be allowed to write to your raw data tables. These should only ever be read from and remain otherwise untouched. ✋ However, for data tables that are produced by dbt itself, dbt needs permission to both read and write to them in order to create, modify, and reference the tables.

Users

Like tools, users should only have the minimum access required to do their jobs. Depending on a user's function within the data team, they may be assigned different permissions within the dbt environment. 

For example, a data analyst may only be allowed to read from various data tables while an analytics engineer can both read from and write to tables produced by dbt models. Similarly, you may have one person on your team who is well-versed in Github, so they should have permission to control Github access on dbt Cloud. 

As a rule of thumb: It is always better to under-assign permissions and add them as needed to ensure a user only has access to what they should. 👍 Otherwise, a user may accidentally delete a row of data or modify a data model without even realizing it. In general, limiting user access prevents mistakes that otherwise would not occur. 

Environments

A dbt project contains different profiles depending on your data models’ creation environment. At the very minimum, you should have one for development and production, and each of these profiles should only have access to the databases corresponding to their environment. 

You don’t want to write development code to production accidentally or, on the flip side, send production code to your development database. Separating permissions based on the environment will ensure you always have the correct data models available. ✅

How to configure dbt permissions

Permissions in dbt can vary depending on whether you use the open source tool or cloud version. In this case, we’ll review setting up your profiles.yml within dbt as well as configuring specific access control in dbt Cloud. 

Open source tool

Within dbt, your profiles.yml file controls the permissions specific to your data warehouse. This is where you create a profile for use in both development and production environments and provide them with the correct data warehouse credentials. You should create a target for every environment you wish to write your data models to (which is usually dev and prod).

Under each target, you’ll input the database name you wish to write to, a default schema, and your credentials. I highly recommend creating a user within your data warehouse specifically for dbt so you can control dbt permissions from your warehouse, too. Remember, you always want to give a user the minimum access to complete their job – and doing this will allow for that. 

For instance, I call mine dbt_user and assign it the role of transformer. This transformer role will need read access to the dbt data model referenced tables in the data warehouse. It will also need write access to the tables you update and create with dbt. 

I assign the permissions I need for dbt to the transformer role to the user. Your profiles.yml development target should look something like this 👇

# example profiles.yml file
madison_example:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: dbt_user
      password: <password>
      port: 5432
      dbname: data_mart_dev
      schema: core

Notice how I’m using the dbt_user I created for this target. This dbt_user has read and write permissions to different tables within the data_mart_dev database because of its assigned role as a transformer. Again (just for good measure), your user needs to have read permissions to the tables it is referencing in your models and write permissions to those you are updating or creating. 

dbt is also configured to use target names as prefixes for the models built using dbt. For example, if your target schema is dev like the above, the schema created by dbt will be dev If your target were prod, then the schema created would be called prod

This allows you to group your models based on the development environment that you’re using. You can even add custom schemas to create even more granular schema names. To do so, add this configuration to the top of your dbt model:

In this case, your resulting schema would be dev_marketing. These custom schemas allow you to properly configure your data warehouse so you know exactly what tables are being created for various parts of the business (and to make sure that they’re in the proper environment).

dbt Cloud

dbt Cloud provides a fine-grained permission control, allowing admins the possibility of setting permissions to nearly every action that can be performed on dbt Cloud. 

It uses two varieties of access to control user-level access in an account:

Every action performed on dbt Cloud is sensitive, so not every team member should have elevated privileges. Restricting actions to users based on their licenses and roles increases the security of your dbt Cloud application and your data warehouse. With different levels of access control, you know the actions a user can and cannot perform. 🚦

Image

License-based Access Control

License-based access control configures user privileges via account-wide license types. Account-wide license types control specific parts of the dbt Cloud application. When a user is invited or added to a project, the user is granted a specific license type (and can only have one license at a time).

There are two categories of license types:

  • Read-Only limits user privileges to only read permissions that are applied to all dbt Cloud resources, regardless of the role-based permissions that have been assigned to the user. 
  • Developer is not limited to read permissions and can be granted any permissions to perform various actions on the dbt Cloud application.

For a project with a minimal number of team users, you can limit permissions to only license-based access control. However, if your project requires fine-grained access control for each user, you should consider role-based access control. 

Role-based Access Control

While role-based access is more secure, keep in mind that it is a feature only available for dbt Cloud accounts that are on the enterprise plan. Role-based access control allows you to set fine-grained permissions in dbt Cloud, whereby you can assign users several permissions to different projects within a dbt Cloud account. 

These role-based permissions are applied to groups (a collection of users), and a user is able to be in multiple groups. For example, one user can be in the analysis and security group, while another user can be in the billing, security, and data group. Users in a group inherit permissions applied to the group.

In order to apply permissions to a group, you will require permission sets. According to the documentation, permission sets are “predefined collections of granular permissions.”

With a combination of low-level permissions, you can create, view, and delete accounts which can then be used to form different roles that can be assigned to a group. dbt Cloud supports a number of predefined permissions, including account admin, account viewer, Git admin, database admin, and analyst – each with its own predefined permissions configured. 

For instance, the Git admin permission set controls who can see project details, modify repositories, and view different environments. Only someone with a high-level understanding of Git should have permissions like this.

Giving someone who doesn’t have Git knowledge this permission could result in poorly created repositories or accidental deletions. So, when using these predefined permissions, be sure you understand exactly what each entails before assigning it to a user. 💯

Control who accesses what data

Configuring permissions within dbt allows you to control who accesses different data assets. It prevents someone from deleting an important data model, altering data in a table, or accidentally pushing a development model to production.

In the open-source dbt tool, you use targets within the profiles.yml file to control access to the data in your data warehouse. In dbt Cloud, there are two major ways of configuring dbt permissions: License-based access control and role-based access control.

With the permissions that your warehouse, environments, and users each have with dbt, you can create a development environment that your team feels safe using — while still protecting the quality of your data.

dbt is a powerful tool when used correctly. Consider permissions from the very start, and you won't need to stress about permission-caused errors. 😮‍💨

👉 Then connect your dbt models directly with Census to streamline your sales and marketing tools! Book a demo with one of our product specialists to learn how we work with dbt to operationalize your analytics.

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.