Product News

Announcing General Availability of Real-Time Reverse ETL, with Confluent Cloud and Apache Kafka® as sources

Will Voutier
Will Voutier January 17, 2024

Last October, when we announced Live Syncs and our intent to create the first-ever Real-Time Composable CDP, we were aware that it was beyond what our customers expected. What they were looking for was faster Reverse ETL syncs, not necessarily activation in single-digit seconds (aka “true” real-time). 

However, at Census we like to push the limits of what’s possible. We’re actively building towards a future where every customer interaction can be real-time. Imagine instantly sending a targeted notification based on a customer's real-time location, A/B testing the right incentive based on their profile data, and analyzing engagement in seconds — all powered by your single source of truth in the data warehouse.

The future of data is real-time, and today’s GA of Live Syncs is our first step towards enabling real-time speed for every single customer use case.

General availability of Live Syncs, enabling real-time data activation

Today, we’re thrilled to announce the general availability of Live Syncs with support for our first two sources — Apache Kafka and Confluent Cloud. We chose to support Confluent and Kafka natively from the ground up because Kafka is the gold standard in event-driven systems, and the authoritative source of truth for streaming event data at many enterprises. 

Live Syncs can activate from real-time streaming data sources with sub-second latency to unlock new high-speed use cases like geotargeted campaigns, abandoned cart notifications, and lead routing in over 200 business tools.

Apache Kafka is an open-source data streaming technology used by over 80% of the Fortune 100 for mission-critical, real-time applications. Confluent, founded by the original creators of Kafka, provides a cloud-native and complete data streaming platform available everywhere it’s needed to massively expand the value of real-time data while eliminating the costly operational burdens of infrastructure management.

“Enterprises around the world use Confluent’s data streaming platform to efficiently and securely implement a near unlimited number of real-time use cases to deliver high-value, differentiated customer engagement. Together, Census and Confluent make it possible for brands to operate around real-time, trusted data streams that equip business teams with the insights they need to build rich, personalized experiences.”
— Rob Taylor, Global Head of Technology Alliances, Confluent

Census now supports real-time, high-speed streaming data in addition to high-quality warehouse data

The advantages of Census Live Syncs

  1. Real-time use cases are no longer a limitation of the Composable CDP — Latency is one of the last barriers blocking some marketing teams from using the warehouse to power their operations. Now, marketers can unleash the full power of the data warehouse by leveraging Customer 360 profiles in real-time.
  2. Fully composable with no vendor lock-in — Census integrates seamlessly with any data stack and business tool, achieving maximum flexibility as a core tenet of the Composable CDP. Previous solutions like packaged CDPs required adoption of their entire system to even begin tackling high-speed use cases. Census prevents both vendor lock-in and unwieldy custom builds. 
  3. Works with existing event tracking infrastructure — Unlike packaged solutions, Census works wherever your source of event data is. The ability to “bring your own event collection” makes Census easier and faster to implement than any other real-time activation platform. Customers can leverage any variety of event collection solutions like CDIs, product analytics, or even custom tracking.

The journey to true real-time Reverse ETL

Leading brands like Sonos, Crocs, and Canva use Reverse ETL to sync customer data from their cloud data warehouse to downstream tools like CRMs and marketing platforms. This enables them to centralize business operations, customer data, and analytics across all teams in a single source of truth.

Ever since we created Reverse ETL in 2018, we’ve built an increasingly sophisticated and scalable data syncing engine that’s highly optimized for processing big batches of data. However, a Live Sync is a sync that runs forever—syncing records from the source to the destination as they change—not based on a periodic diff. 

This is a big departure from how traditional Reverse ETL normally works. 

Traditional Reverse ETL, circa 2018

Traditional Reverse ETL: Batch-based diffing in the data warehouse

We built a new system to overcome the inherent latency in Reverse ETL. Because interacting with Kafka is very different from generating and diffing snapshots of a SQL query, building Live Syncs required us to re-engineer our sync architecture from the bottom up to activate data with sub-second latency.

Warehouse Streaming Reverse ETL: Near real-time
Warehouse Streaming Reverse ETL doesn’t yet meet the requirements of “true” real-time, i.e. activation in single-digit seconds

While many major data warehouses have made strides in recent years ingesting streaming data, far fewer are able to stream out incremental query results. Streaming Reverse ETL is a solution enabled by major data warehousing vendors like Snowflake’s Dynamic Tables and Databricks’s Streaming Tables. While it can be significantly faster than traditional data processing pipelines, it requires data teams to learn and implement an entirely new way of modeling their warehouse data and the latency offered. Realistically, Streaming Reverse ETL works in the frame of 1-5 minutes, not 1-5 seconds, which may not be performant enough for some real-time applications.

Real-Time Streaming Reverse ETL: True real-time data activation

With today’s release of Confluent and Kafka as Live Sync sources, we’re the first Reverse ETL platform to offer true real-time data activation. Our Live Syncs can activate off a real-time streaming data source to trigger actions in downstream tools as fast as sub-second time, and is built on top of the gold standard in streaming technology. We believe that any approach to true real-time data activation must involve Kafka, alongside or instead of streams available in cloud data warehouses.

Want to see it in action? We’re happy to show you live

Warehouse-Enriched Real-Time Reverse ETL: The best of both worlds

Census’s end goal — enriching real-time event streams with historical data in the data warehouse or lakehouse

‎That said, a streaming data source on its own lacks the historical data that you’ve worked so hard to model in your warehouse. Our ultimate goal is to offer Real-Time Streaming Reverse ETL with Warehouse Enrichment, allowing any Census customer to easily join their high-speed event streams with their high-quality warehouse data. This unlocks a new era of data activation, where you’ll be able to serve your customers better, instantly.

How Live Syncs work under the hood

Live Syncs are a new type of sync in Census, available to all organizations with access to Continuous syncs.

When users create a sync, they can choose a Run Mode for that sync, either Live or Triggered. 

  • Triggered Syncs can be run manually, via API or external trigger, or on a schedule. 
  • Live Syncs are always running while they are enabled, syncing data in real-time.

Live run mode is available for syncs from select sources, starting with Confluent Cloud and Kafka, and replaces continuous syncs for these sources. After a user connects their Confluent Cloud or Kafka cluster, they must define schemas for the Kafka topics they want to use via the Models tab.

Our Confluent Cloud source integration was verified by Confluent as part of their Connect with Confluent technology partner program.

Getting started with real-time

We strongly believe that composable and warehouse-native marketing is the future, and that warehouse latency should no longer be a barrier to real-time customer engagement. 

Next on our roadmap is support for more Live Sync sources such as Snowflake and Databricks, as well as the ability to enrich streaming events with warehouse data.

If you share our vision of the data warehouse as the center of the business, we’d love to help you start building your Real-Time Composable CDP. Get a demo or start a free trial today.

Not yet a Confluent customer? Start your free trial of Confluent Cloud. New sign-ups receive $400 to spend during their first 30 days. No credit card required.

We look forward to hearing from you!

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: