Best Practices

Future Proof Your Data Stack with the Composable CDP | Census

Sylvain Giuliani
Sylvain Giuliani December 21, 2022

Syl is the Head of Growth & Operations at Census. He's a revenue leader and mentor with a decade of experience building go-to-market strategies for developer tools. San Francisco, California, United States

Five years ago, Customer Data Platforms (CDPs) started gaining popularity by marketing themselves as an all-in-one solution to collect, unify, and activate customer data. But let's be honest: They failed.

Customers who adopted off-the-shelf CDPs have long since struggled with their rigid data models, long onboarding times, and redundancies across analytics and marketing tools with only 1% of companies actually meeting their current and future needs with CDPs. 😬

Now, the rise of the modern data stack and data activation means that companies have access to best-in-class solutions for each component of an off-the-shelf CDP. As a result, the Composable CDP has emerged as an ideal solution for data-forward companies looking to maximize their existing data investments. 📈

In this post, we’ll discuss off-the-shelf CDPs, the rise of data activation, and how to turn your existing data platform into a Composable CDP to start getting value in your organization today.

But first, WTF is a Customer Data Platform and why do people buy them?

A Customer Data Platform (CDP) is an all-in-one platform built for marketing teams to collect, transform and activate customer data. The number one reason why marketing teams are rushing out to buy CDPs is to get a unified view of their customer, with 63% of marketers saying that unifying data is their key challenge. 

There’s a lot of variety within the CDP category, but generally, all CDPs have (at least) these three components: Data collection, data transformation, and data activation.

You might be thinking, “Hey, these components sound mighty familiar to those of the modern data stack” — and you’d be right. The CDP provides a lot of overlapping functionality to that of the company’s existing data platform. So... Why do data teams want them?

In our experience, there are two scenarios where data teams are looking at CDPs 👇

  • For their event collection features. CDPs like Twilio Segment started as event collection and tag management tools. As a result, they offer incredibly developer-friendly APIs that make it really easy to track new events. The biggest downside to this is that because they are an all-in-one solution and have a volume-based pricing model based on Monthly Tracked Users, they quickly become very expensive for companies with many customers. This makes all-in-one CDPs a very expensive event collection solution – especially with other more cost-effective options in the market like Snowplow
  • Because their marketing team wants to buy one for data activation purposes and needs them as part of the evaluation and implementation process. 

The biggest problem: CDPs are yet another copy of your data

The biggest problem with CDPs is that they claim to be the single source of truth for customer data, but they do not (and cannot) replace data warehouses. Why? Simply because CDPs don’t have all the customer and company-level information to be a complete source of truth for analytics.

Instead, data collected by CDPs is copied to the data warehouse to power trusted analytics, instantly diluting the CDPs' promise of a “single source of truth.” While some CDPs are now supporting importing data from the data warehouse, doing so results in additional data latency and still doesn’t solve the fact that the CDP is still just another silo of data. 🙄 Due to this fragmentation, data inside the CDP is not as trusted or fresh as in the data warehouse.

CDPs are another copy of your data
You don't need yet another source of truth

Why you don’t actually need a CDP (if you have a data warehouse)

If there’s one tool that can claim that it’s the “single source of truth” for customer data, it’s the humble data warehouse. Most growing companies are already investing heavily in data warehouses to power analytics and reporting. Plus, advancements in the modern data stack with transformation tools like dbt and the DataOps movement have made the data warehouse a hub to operationalize data.

In fact, the data warehouse has become the place where:

✅ All company and customer data lives

✅ Data is most trusted as it’s maintained by data teams

✅ It’s well governed (by data teams)

✅ Data is secure by default

Get a demo


At Census, we believe the data warehouse has won as the source of truth for customer data. We also believe that many of the components of a CDP already exist inside a company’s data platform. So, let’s break it down 👇

  • Data Collection. Engineering and Data teams have already set up robust data collection to collect vast quantities of behavioral (e.g. product, web analytics), operational (CRM), and transactional data (orders, subscriptions) on customers. These are either being streamed directly into your data warehouse (in the case of behavioral data) or being consolidated via ETL in the data warehouse. 
  • Data Transformation. Once the data is in the warehouse, data teams are already cleaning, joining, enriching, and flattening this data to make it consumable by analytical tools. For example, data teams need to unify many different tables of customer data across the organization in order to break down Customer Lifetime Value (CLV) by each individual customer ID. 

So the collection and transformation pieces are already happening inside your organization – and the data warehouse has emerged as the most complete single source of truth for customer data. Today, however, the most common destination for this single source of truth is a dashboard. But dashboards are where data goes to die because actually activating those insights still requires complex pipelines and engineering effort. 

This is where warehouse-native data activation comes in. 🦸

Data activation is the missing link needed to connect the data collection and transformation efforts of the data team and make that data accessible to marketing teams to activate across all their channels. Warehouse-native data activation tools like Census enable marketers to unlock data directly from the warehouse – all without needing to know SQL — while still enabling data teams to maintain governance and control that comes from being built on the data warehouse

These warehouse-native activation tools enable marketers to:

  • Sync customer attributes and lists from their data warehouse into “systems of action” such as Salesforce Marketing Cloud, Marketo, and Hubspot, and update them in real-time as the data is transformed. This pattern is known as reverse ETL
  • Build dynamic customer (and other entity) segments by filtering across trusted data models from the data warehouse and syncing them to all their advertising and marketing automation platforms from a single place. 
Census Audience Hub enables marketers to self-serve with powerful visual segmentation on top of the data warehouse

In conjunction, data teams can:

  • Automate reverse ETL pipelines to sync data to downstream tools used by business teams.
  • Define approved entities (or business objects) for marketing teams to activate. For example, they can define the underlying approved “users” and “events” tables that marketing teams can use for audience building.
  • Maintain full governance and control of all customer data leaving the warehouse with robust logging, observability, and access control features.

TL;DR: Warehouse-native data activation tools (like Census) bridge the gap between marketing teams and data teams and fundamentally change the way data and marketing teams collaborate.

CDP data activation tools like Census bridge the gap between marketing and data teams
 

Get a demo


The Composable CDP: Bridging the gap between marketing and data teams 🤝

The composable CDP is a best-of-breed solution that serves as an alternative to buying an expensive all-in-one CDP for data-forward organizations. Now, data teams can turn their existing data platform into a CDP by adding the data activation layer since CDPs duplicate a lot of the functionality of the company’s data platform. 

This alternative approach to a CDP is known by a few names: Composable CDP, unbundled CDP, headless CDP, etc. Although the name might differ depending on who you’re talking to, they are all ultimately saying the same thing: Use your data warehouse as the source of truth for customer data, and activate data where it already lives.

Composable CDP diagram


Building a composable CDP with data activation

Typically composable CDPs have the following components built on top of the data warehouse or lake.

Data Collection:

  • Event tracking (e.g. Snowplow): Capture rich, quality behavioral data across all platforms and channels in a common format and stream it into your data warehouse or lake. Notably, we don’t recommend Segment or Rudderstack in this category purely from a cost perspective, as both of these companies position themselves as all-in-one solution providers and end up being more expensive down the line. 
  • ETL (often Fivetran): Replicate data from your SaaS tools and databases across marketing, sales, finance/IT, product, etc. into your data warehouse.

Data Transformation:

  • dbt: Once all your raw data has landed in your data warehouse, you can use SQL to clean up and transform the data into clean tables/views. 
  • BI (e.g. Looker or Sigma): Enable business teams to aggregate or filter customer data in a visual way, or analyze insights from marketing activities.

Data Activation:

  • Census: Sync data from the data warehouse into the tools that business teams rely on (e.g. Salesforce, Marketo, Facebook Ads, etc). Enable business teams to build self-service segments and audiences leveraging all of the rich data available in the data warehouse.
Build a composable CDP
 

Get a demo


Benefits of the Composable CDP

Here’s why you should use best-in-class tooling from your existing data platform to create a Composable CDP, rather than buying an expensive all-in-one CDP.

A true single source of truth for customer data ⭐

While legacy CDPs on the market only give you a partial copy of your customer data, the composable CDP is built on top of your data warehouse to provide you with a complete view of ALL your customer data. With this 360-degree view, organizations can easily leverage data from all sources, including POS systems, data science models, and even offline data to get a comprehensive understanding of their customers.

Composable CDP single source of truth

More flexibility

CDPs force customer data to conform to a rigid structure. This might appear efficient at first, but businesses come in all shapes and sizes, meaning they require much more flexibility than what these CDPs can offer. The composable CDP is built on the data warehouse, which means it’s built on top of trusted models and transformation maintained by the data team using a tool like dbt. This flexibility allows you to represent relationships between users and entities, so your business can have complete control over how you unify customer identity.

Better data governance

Rather than an off-the-shelf CDP managing all of your customer data, the composable CDP is built on top of your data warehouse so you get to leverage all your existing data governance, security, and observability protocols for managing your customer data. This is particularly important with growing privacy legislation like GDPR and CCPA mandating the customer’s right to be forgotten. It’s never been more important to know exactly where all your customer data is going, and have control over all of it. 

Future-proof by design

Composable CDPs are future-proof by design and allow you to avoid vendor lock-in. Since every element in a composable CDP is modular, you can choose the best-in-class tools that fit the requirements at that time. As the requirements of your business evolve, you can continue to invest on top of your composable CDP as opposed to implementing a new stack from scratch.

Why best-of-breed matters

Software commonly goes through cycles of “bundling” and “unbundling”, but a few things remain consistent:

🙅‍♀️ No one wants vendor lock-in

⏱️ Everyone wants fast time-to-value

🫰 Everyone wants a low total cost of ownership

Generally, organizations are tempted by the lure of time-to-value and cost of ownership for all-in-one solutions, which is why they compromise on vendor lock-in. Though initially you may be quoted a lower number for an all-in-one CDP, their volume-based pricing model is designed to squeeze you for what you're worth. Vendors will often heavily discount their first year for this reason (just ask any Segment customer). 🤷

CDP all-in-one vendor pricing

Ultimately, when it comes to the CDP space (which has really only been around for the past 5 years) there is no true all-in-one solution. Organizations that buy all-in-one CDPs are forced to ask themselves which part of the stack they are willing to compromise on and make trade-offs on capabilities that are actually critical to their business today. 

Companies may feel like it’s worth betting on an all-in-one solution that has strong data collection capabilities today (e.g. Rudderstack) in the hope that their data activation features might catch up, but if we’ve learned anything from CDPs it’s that the all-in-one promise has not come to fruition. 😞

The benefit of buying best-of-breed tools (as the name would suggest) is that you truly get the best-in-class for that piece of the stack. 🏆 As an added bonus, you get to future-proof your stack since you can switch modular pieces of the stack out. Data and software stacks have functioned this way for at least the last 10 years which has enabled so much of the innovation in the space (contrast this to the monolithic times of Oracle and Teradata).


The future is warehouse-native 🔮

Every company eventually needs to invest in some sort of “customer data platform.” But rather than buying an expensive all-in-one solution, why not leverage the building blocks for a composable CDP built on your existing data warehouse? That way, you can start delivering value today and avoid yet another copy of your data.

There’s no denying that when it comes to the future of the CDP, the tailwinds are heading in the direction of the data warehouse. CDP vendors like Twilio Segment and mParticle have recognized this by recently announcing their own versions of “reverse ETL connectors,” but their rushed attempts to “slap on” reverse ETL are clearly an afterthought.

With data activation and reverse ETL democratizing access, the data warehouse is best positioned to become the system of customer record that powers not just your marketing technology, but the entire business operations of the company. 🚀

💡 Learn more about how Census can help you activate the place your customer data already lives. Book a demo with one of our product specialists to see how we can help you build granular segments, and sync customer data to all your marketing and advertising tools, without any code.

Get a demo 

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.