Best Practices

Segment vs Snowplow: From broken promise to robust solution | Census

Simon Pickerill
Simon Pickerill May 10, 2023

Simon is the content marketing manager at Oyster, working to create a world where companies everywhere can hire people anywhere. He is a human-centric content marketer with a love for long-form, premium content projects.

Customer Data Platforms (CDPs) promise a quick win with your data. As an off-the-shelf data solution, CDPs offer to do the hard work for you—generate customer data from your products and platforms, model it, and deliver it to your tools of choice. They promise to provide a single source of truth right out of the box, so that the coveted Customer 360 is not only possible, but easily attainable. 👀

This all sounds wonderful, but behind the shiny promises are fundamental problems with the way CDPs operate which can set data teams and marketers back where they started, or worse. 

Choosing a packaged CDP as your primary data platform can leave you with an incomplete data set, data you can’t trust, and a vendor you’re locked into. Let’s dive deeper into one example with Segment, and explore the challenges of working with a CDP up close. 🤔

Segment’s customer data platform 

Segment started out as 400 lines of javascript that could send data to just eight destinations: Google Analytics, KISSmetrics, Mixpanel, Intercom, Customer.io, CrazyEgg, Olark, and Chartbeat. Several years and product iterations later, Segment transformed into a customer data platform, promising to make it easy to send data to hundreds of destinations. 

Segment’s product messaging lists several benefits that appeal to marketing teams and the wider organization. The ability to build a single customer view, personalize the customer experience in real-time, and segment your audience based on behavioral data. These are all great use cases for the marketing team, but Segment and traditional CDPs often fail to live up to their promises—leaving behind a disjointed, incomplete data set you can’t put to good use. 

Low visibility of your customers

The richness of the data you can collect with Segment is highly limited compared to best-in-class data generation tools. If you’re working towards building a cohesive single customer view, the more data points and context you can gather behind user behavior, the better. Ideally, you’ll have plenty of data to help you identify users or buyers on the individual level by joining together user identifiers across all your platforms and products

But event data from Segment is limited to a small number of data points per event, compared to 100+ data points possible through a dedicated behavioral data platform (more on that later). This limits your ability to develop a comprehensive understanding of your customers—worse still, you could be missing key data to inform your marketing strategy. Limited behavioral data means your view of the customer could be incomplete, fragmented, or just plain wrong.

No control over your data, or data quality

CDPs like Segment are essentially “black boxes”, leaving you in the dark about how your data is collected and transformed. That’s a major red flag. 🚩 Without any say over how data is generated, you can’t account for its quality or usability. You also can’t decide how to track your key events, or what types of data should be collected. 🤨

This might not be a problem if you’re just sampling data. But for more complex use cases, such as personalization, lack of control is a big deal. You’re forced to work with Segment’s predetermined data structures, which may not necessarily fit with your business model or match your expectations for data quality. 

Segment also siloes off your raw data, holding it within its platform and limiting your ability to unify your data across multiple platforms and products to get a single source of truth. And since you’re not in control of your data collection, you can’t easily mitigate privacy measures such as browser restrictions and ad blockers. You’re left with major blindspots since you have little to no visibility over key customers using browsers that block tracking, like Safari and Firefox. 🦊

This lack of control severely limits your ability to analyze and/or work with the data and, ultimately, could lower your confidence in your data’s integrity. 

Locked in, with no ownership

Since Segment operates on a SaaS model, your data is locked into a platform you don’t own or control. If Segment suffers an outage or is compromised in any way, your data is vulnerable since it’s held in their cloud environment. ☁️

No big deal... Segment can be trusted with your customers’ data, right? But alarmingly, software companies experience on average 12 incidents of unplanned application downtime every year. At the time of writing, Segment has reported 9 disruptions to their services, just this month. 

Every minute data goes missing could be costing your business thousands of dollars, depending on how you use data and the scale of your operation. Even in the best-case scenario, any data loss event is a major concern for customers who are increasingly conscious of their privacy, and how their data is stored and used. 

Then there’s the vendor lock-in itself. If you want to expand your data volumes, Segment’s scaling price structure can quickly become expensive. But Segment’s platform is not designed to be flexible or make it easy for you to evolve your infrastructure. This can leave you with an eye-watering bill if your volumes increase significantly. 💸

Snowplow’s behavioral data platform

Created in 2012, Snowplow emerged as an open-source data collection tool in the early days of data warehousing. Redshift had just been launched, and many hacky, one-man-band data Macgyvers adopted Snowplow as a free, versatile solution for tracking events across web and mobile. 💪

Fast forward to today, and Snowplow is a robust, flexible data solution, trusted by companies like Strava, Autotrader, and Gousto to create, enhance, and model high-quality first-party customer behavioral data. Let’s take a look at what makes Snowplow the platform of choice for data generation compared to CDPs like Segment. 

Rich, behavioral first-party customer data

Snowplow generates and delivers super rich and granular behavioral first-party customer data into your warehouse or data lake, made up of 100s of data points to give you an overview of what your customers are doing. A single event from Snowplow can give you a wealth of insight into customer behavior, who they are, where they’re based, and how they’re interacting with your product, web page, or even in-store purchases. 🕵️

Because Snowplow delivers data in a raw, unchanged, “unopinionated” format, it’s in an ideal state for combining with other data sets to build a truly unified view of the customer. You’re also able to model the data in a way that makes the most sense for you and your business, rather than relying on the prepackaged data that comes from a CDP. The end result is a rich, powerful data set that can help you deeply understand your customers and prospects, without missing data or gaps in your customer 360. 

Unmatched data quality

When it comes to how you track your data, Snowplow puts you in the driver's seat. 🏎️ Snowplow lets you structure your event data with self-describing schemas— a bit like writing a blueprint for the type of events you want to track before you track them. 

That means you get a highly “expected” data set by the time your data lands in the warehouse. In other words, your tidy, well-structured data will need very little (if any) cleaning, which is a joy for data teams and analysts. 

Snowplow data arrives clean and well-structured in your data warehouse, a joy for analysts to work with. 

Unlike with Segment, the quality of your Snowplow data is something you can control and rely on. In fact, Snowplow data can be optimized to a point where you can trust it for advanced data use cases like feeding machine learning models or building predictive analytics engines.

The best part about Snowplow’s infrastructure is that it’s highly adaptable. So as you evolve your product or want to track new features, you can adapt your schemas to ensure no loss of data or compromise to quality over time. 

Flexibility and total data ownership

Unlike Segment and other CDPs, it’s possible to run Snowplow natively in your cloud environment, so you can have total ownership of the end-to-end pipeline. 

Essentially when you use Snowplow, you know you’re not sending data off to any third party where its security and governance are outside of your control. Everything lives under your roof, so to speak, and your data, your customers’ data, and sensitive information, never leave the safety of your own environment. 🔒 For large companies and/or those who have privacy concerns, this is massive.

Better together: Snowplow and the modern data stack

Since Snowplow is the best-in-class tool for creating first-party customer behavioral data, it’s perfectly suited to work alongside other platforms to form a complete data stack. Building a stack of purpose-built tools has a lot of advantages, including the freedom to piece together your end-to-end data pipeline, and the ability to deliver extremely high-quality data into your tools of choice. 🤖 This not only solves your tooling problem, but more importantly, Snowplow supports solving the customer data problem.

Unlike a CDP, which assumes the role of data collection, transformation, and activation, you’re free to combine Snowplow with tools like dbt to model your data on your own terms and build a centralized source of truth in your data warehouse. From there, you’re laying a path to democratize your data, opening it up to multiple teams in the organization from marketing and sales, to the product team, and so on. 

In this way, rather than buying a CDP that siloes off your data and limits your control, you’re composing one yourself. This unlocks countless opportunities for your data. And when your behavioral customer data is centrally owned and governed in the warehouse, you’re in the best position possible to govern it properly, combine it with other data sets (sales data, for example), and use tools like Census to make it operational to frontline teams. 

Instead of being siloed away in a black box, your data should be centrally owned and available. That’s a reality with Snowplow, while it's much more difficult — if not impossible — with CDPs like Segment. 

Behavioral first-party customer data in minutes

Since the recent launch of BDP Cloud, it’s now easier than ever to deploy Snowplow technology hosted and managed by Snowplow. This means it’s possible to benefit from Snowplow’s robust infrastructure without the barriers of setting up your own cloud environment or the time it takes to host Snowplow yourself. 🔥

Technology like BDP Cloud brings the speed of access and ease of use that traditional CDPs like Segment promise, but with no loss of integrity. It makes choosing future-proof solutions like Snowplow an even easier choice for delivering data that, unlike CDPs, keeps its promises. ✅

Join the 1.9 million websites and apps already using Snowplow and start harnessing the power of first-party customer data today.

🚀 And when you want to activate all that first-party data, turn to Census. Book a demo with a product specialist to find out how.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: