Customer Stories

Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Jeff Sloan
Jeff Sloan April 08, 2024

Jeff is a Senior Data Community Advocate at Census, previously a Customer Data Architect and a Product Manager. Jeff has strong opinions on LEFT JOINs, data strategy, and the order in which you add onions and garlic to a hot pan. Based in New York City.

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

That’s why Labelbox, an AI platform, implemented Census Embedded to enable rapid data onboarding from their customers’ data warehouses, all within a user-friendly and Labelbox-branded experience. 

With Census Embedded, Labelbox is one of the first applications to become fully data warehouse-native. Labelbox customers can now onboard their data from their cloud data warehouse and power workflows with fresh data 24/7.

Want to learn how they helped their application speak to data warehouses? Read on.

Data Onboarding: Labelbox enriches data warehouse data

Labelbox is an AI SaaS company – sitting at the heart of machine learning development workflows at companies like Walmart, NASA, and Warner Brothers. Good, fresh data is critical for them to be the most valuable to their customers.

Specifically, Labelbox automatically tags images, text, and audio to describe the contents of the files. Data scientists use this labeled data to develop their machine-learning models. If you can imagine the bad old days, this would take teams of interns, outsourcing, and Mechanical Turk surveys. It was a slow, manual, and expensive process.

But Labelbox faces a fundamental challenge – to add value to their customers, they need access to their customer’s data. This data is often already sitting in a cloud data platform like Snowflake or Databricks, or a data lake built on GCS or S3.

Previously, customers were forced to build integrations from their data sources directly against Labelbox’s API. Labelbox wanted better for their users.

Census Embedded in Action: How it works

Labelbox uses Census Embedded to import data continuously from their customers’ data warehouses and 20+ other sources. 


"The benefit of this approach is that our customers can now sign up for Labelbox, rapidly onboard, connect to their warehouse, and they're done. Customers no longer have to write data pipelines into our API to get the most out of our platform." — Kahveh Saramout, Lead Data Engineer at Labelbox



The process is:

  1. Labelbox customers log into Labelbox, click to connect their data warehouse, and then add their credentials in a secure portal provided by Census Embedded. 
  2. Labelbox customers select the dataset they would like to import into Labelbox and map columns from their data to available fields in Labelbox.
  3. Finally, customers schedule their import jobs to run on a desired cadence.

All of the necessary connections and scheduled syncs are created by the Labelbox application via the Census Embedded API. Within the Census Embedded user interface, Kahveh and his team have deep observability and alerting capabilities to manage these imports at scale.


The Future is Data Warehouse-Native

Labelbox is a pioneer in data warehouse-native applications, but they certainly won’t be the last.

At Census, we believe every SaaS company will need to integrate with the cloud data warehouse. As SaaS providers integrate AI into their platforms, access to good customer data will become the critical difference between valuable, personalized features and generic fluff. 

Census Embedded is the best way to access your customer's rich datasets, whether you’re importing or exporting from your SaaS application.

Interested in learning more about becoming data warehouse-native with Census Embedded? ‎Request a demo today and learn about how an embedded integration platform can help you onboard your customers faster and add exponential customer value.

Related articles

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?