Best Practices

In context: Answer data questions that aren't asked | Census

Nicole Mitich
Nicole Mitich June 03, 2022

Nicole Mitich is the content marketing manager @ Census. She's carried a love for reading and writing since childhood, but her particular focus is on streamlining technical communication through writing. She loves seeing (and helping) technical folks share their wisdom. San Diego, California, United States

The value of data isn’t in the quantity you have, it’s in its usefulness. A huge dataset might make you feel like you have all the answers, but unless you can put it in context, all you have is a lot of noise. 📺

Take the weather, for example. While Census Co-founder and CEO Boris Jabes was chatting with Avo Co-founder and CEO Stefania Olafsdottir about the weather in her native country of Iceland, they broached the subject of wind. Despite how notoriously windy the country is, Icelandic weather forecasts rarely include comments about the wind – even though it can be the deciding factor between basking poolside or staying indoors.

“It’s almost like cultural context, which is hard to teach a computer,” Boris said. “I remember when I was a kid not understanding why there were all these supercomputers just for the weather. Now I understand: It’s because there are so many variables.”

While you might feel pretty confident in your decision to go to the pool based on comprehensive data about temperature, humidity, and precipitation, the wind helps put everything else into context. 🍃

Context is king 👑

In simple, data terms, context refers to the network of connections among data points. It affects everything in data collection – from what data points should be gathered to who gathers that data and, ultimately, how they define it. For that reason, Stefania is not a fan of the term “data scientist” because it can be used to describe so many facets of what data people do. Essentially, it’s an umbrella term that groups all data folks together without really labeling any of them properly. ☂️

“It’s probably similar to being a software engineer: You don’t hire generic,” she explained. “You can have 10 years of really juicy infrastructure engineering, but no skills whatsoever in front-end development. Data science is a similar thing.”

Data teams at most young companies start out just getting the leadership team spun up on and oriented to its inner workings; basically, providing a high-level rundown of where they are and what’s recently happened. A strong data analyst will eventually begin adding context to that data – so their reports are no longer looking backward, but looking forward. ⏩

The way they go from reporting on what happened last quarter to influencing the company's future direction is similar to how we add context to the data: By clarifying what we’re being asked to uncover. Every person on a data team should be challenged to respond to questions by asking, “What do you mean?” 🤔 Before diving into the “how”, data teams need to understand the “what” and “why.”

“There are so many layers to defining a question,” Boris explained. “There’s the dictionary definition of the word, there's the semantics within our company, and there’s the final version they actually want you to answer.”

Take a straightforward question like, how many users do we have? Before you can answer that question, you need to understand how “user” is being defined.

Do you become a user when you download the app?

When you accept the terms and conditions?

Or not until you’ve completed an activity within the app?

We’ve said it before, and we’ll say it again: Context 👏 is 👏 everything. 👏

When you know the contextual parameters of a question, you can build an infrastructure that is strategically designed to evolve as the pertinent questions mature.

Inevitably, definitions will shift over time. Metrics and KPIs will change as products move through their lifecycles. A forward-looking data team will be aware of whether they want to answer different versions of the question in the future, or if the data is structured to provide only one definition. Is the infrastructure evolutionally intelligent?

“You need the flexibility to modify how you gather data while staying consistent, so you can report on things year-over-year,” Stefania said. “But at the end of the day, you have to start somewhere. Don’t try to make it perfect – because it will never be perfect.”

A shared frame of reference aligns teams

Contextualizing your data makes it less abstract to other teams in your company. When your data makes more sense and your data team can internally show how its findings impact business decisions, you can get your product and development teams amped up about analytics. 💪

Not only does this hype make product and development folks excited to include the data team earlier in the development cycle, but it also reduces the once-wasted time and resources associated with including the data team at the very end of the development cycle. When you incorporate the data team at the start of your development cycles, you can streamline the decision-making process, driving strategic decisions earlier.

“It’s a company milestone to have the product and development teams excited to understand the impact of what they’re doing,” Stefania said. “Such a huge part of how metrics are used involves having stakeholders aligned on what you’re trying to measure. That turns into the product stakeholder, the data stakeholder, the developer, and the designer talking about the goal of the feature and how to measure its success before it’s released.”

In a positive product culture, the data team has minimal involvement in developing the product roadmap, but even if they're uninvolved in the roadmap creation, they're still included in the conversation from the very beginning.

Why? Those conversations provide the crucial context that helps the data team think about what questions need to be answered to deliver the insights the company needs.

So, let's backtrack to the wind discussion. A simple question like, “How warm will it be on Friday?” could be answered with “70 degrees.” That sounds perfect without context, but with prior context, the data team can know what you’re really wondering is, “Will Friday be a good day to schedule a picnic?”

Now, if that 70-degree temperature forecast is predicted to be paired with 40 mph wind gusts, that likely changes the answer to, “Friday might not be the best day for a picnic.” 🥶

A single data point is useless. As a matter of fact, even hundreds or thousands of data points are useless on their own. The truth is, no matter how rich your data is or how diverse your data sources are, context is key to delivering real, measurable impact.

Want to learn more? You can catch the full conversation between Boris and Stefania below, or on your favorite streaming platforms. 🎧

Got thoughts and opinions about this topic? Join the conversation around this, and many other data best practices, in ✨ The Operational Analytics Club. ✨

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: