Best Practices

The secret to collaborating in the modern data stack world | Census

Parker Rogers
Parker Rogers March 21, 2022

Parker is a data community advocate at Census with a background in data analytics. He's interested in finding the best and most efficient ways to make use of data, and help other data folks in the community grow their careers. Salt Lake City, Utah, United States

Here’s an activity for the data professionals reading this: Ask one of your stakeholders “What do you believe is the most efficient way for us to collaborate?” Odds are, your answer to this question and theirs may be misaligned.

Similarly, if a stakeholder were to ask you the same question, they’d likely be a bit surprised, too.

Sure, you could do this activity in any work relationship, and the reactions won’t be 100% satisfactory. However, I find the relationship between data professionals and stakeholders to be particularly interesting. As a result of the modern data stack era, data professionals have quickly evolved from a mere luxury to a necessity in organizations. Beyond simply building dashboards and helping inform long-term decision making, data professionals have inched closer to their organization's operations, and are tasked to support immediate revenue-driving activities. Not coincidentally, stakeholders are collaborating with data professionals more frequently than years prior.

Although collaboration has quickly become more frequent, it hasn’t improved at the same rate. I’ve observed this both personally and through various conversations with data professionals.

I’ve thought about this so much, in fact, that I decided to write a blog about collaboration in the modern stack world. This is not a straight-forward “how to” blog (So I guess the title is misleading!). I don’t believe organizations can simply adopt a new process to answer this question. Sure, processes make a significant impact, and we’ll go over them in detail, but education and a change in behavior is required from both parties, and that isn’t quite as easy as swiping a card and giving everybody a seat license.

First, let’s learn more about data professionals in the modern data stack world.

Data professionals in the modern data stack world

Rewind the clock 10 years and think about the responsibilities of data professionals. We were aggregating and cleaning data, generating reports, performing analysis, and providing insights. Our contributions were valuable, but the amount of data we collected was miniscule compared to today, and we weren’t as essential to an organization’s operations. Additionally, unless you suffer from the Mandela effect, the term “modern data stack” didn’t exist back then. Fast forward to today. In addition to the responsibilities above, we have a plethora of other responsibilities which piggy back off the modern data stack: automating “real time” workflows and pipelines, managing and maximizing each department’s frontline tools, enabling self-serve data access, etc. etc. As a result, we’re more central to our organization’s operations and revenue driving activities.

This increase in responsibility is fortunate (it gives us a seat at the table) but it also means we must develop a deeper understanding of our organizations as a whole and the OKRs of individual stakeholders. We should be more attentive and engaged in company-wide meetings and updates, goal planning sessions, etc. Through education, our deliverables will become more clear, the data we manage will become more purpose-driven, and we’ll collaborate more efficiently with stakeholders.

On the flip side, stakeholders should become more knowledgeable about data-driven culture, the modern data stack, and how it increases our responsibilities. It will help them better understand our bandwidth, capabilities, implications of requests, and ultimately help us collaborate more efficiently. If they fail to do so, trust will diminish. They’ll be disappointed in the outcomes and timelines of deliverables, and they’ll be less likely to achieve their OKRs.

Stakeholders in the modern data stack world

The modern data stack has made stakeholders (and any non-engineer employees for that matter) more data literate and technical, so much so that there’s even a term for these unofficial technical data users: citizen data scientists. Even in the absence of a top-notch data product, they constantly run their own analysis for short term decision making.

Just as the push toward PLG in recent years increased the attention given to data teams, it pushed stakeholders (and the ops teams below them) to learn more about customer data, and what that data means for their projects, campaigns, and targets. And with this more data-driven focus has come more data-driven OKRs, which (ideally) all stakeholder work is closely tied to.

This means, whether a stakeholder is in sales, marketing, customer success, etc. they utilize data and SaaS platforms to fulfill their responsibilities. SQL-savvy stakeholders frequently have a deep understanding of their tools, as well as a growing understanding of the tools in the data team’s arsenal, and they are increasingly capable of completing more than a handful of the requests they give to data teams.

So what does this mean for the data professionals? Well, it means that, much of the time, our stakeholders are the ones who understand what data they need best, because that data is directly related to the OKRs tied to larger business outcomes. As such, it’s pivotal that data teams recognize not only the growing technical expertise of the people requesting work from them, but honor that they know what data is most useful to them.

When they submit a simple request, or introduce a new, long-term data initiative, it is based on what they believe to be possible and manageable. I like how Sarah Krasnik described it:“It’s not the place for the [data] team to decide what data gets sent. You’re not the ones using the output, so you shouldn’t be the ones deciding the input.”If data professionals fail to understand their stakeholders’ needs and expertise, and fail to execute their requests, they won’t benefit their organization, and they’ll lose the trust of a key business ally. On the bright side, when both parties better understand each other's responsibilities and expertise, and change their behavior accordingly, the collaboration will become dramatically more efficient. Additionally, trust will grow between both parties rather than the typical ebbs and flows of disappointment. For example, a stakeholder is less likely to set an unrealistic timeline for a project when they have a baseline understanding of its requirements. Similarly, a data professional won’t throw a project on the backlog for several months when they recognize that a stakeholder’s request is directly tied to a major OKR.

How to collaborate more efficiently in the modern data stack world

As I mentioned earlier, this is not a straight-forward “how to” blog. The information below can help improve collaboration, but it will be in vain unless both parties understand each other’s responsibilities, and adapt accordingly.

I’ll focus on two different sectors to achieve efficient collaboration:

  1. Channels of collaboration. Where communications should happen depending on the problem at hand.
  2. How to collaborate. How to kick-off, plan, and manage a project. This is particularly useful for larger projects.

Channels of collaboration

Josh Richman (senior manager, business analytics, FLASH)  and I discussed this recently in an OA coffee discussion. The topic was “How stakeholders and data professionals can collaborate more efficiently”, and it inspired me to write this blog! During the discussion, Josh shared the process that stakeholders use for requests:

I like this mental model for thinking through collaboration methods because it keeps both parties organized and conscious of each other's time. Additionally, when larger requests live off the black hole that is Slack, they’re less likely to be forgotten about, and more likely to get completed in an appropriate amount of time.

Next time someone has a large, context-heavy request, perhaps ask them to fill out a Jira ticket or write an email accordingly.

How to collaborate

Irina Kukuyeva’s blog The "dark arts" of stakeholder management sums up how stakeholders and data professional collaboration better than I ever could. I highly recommend giving it a read.

In Irina’s 10+ years of stakeholder management, she’s found stakeholder alignment is more difficult than the actual analysis of data. You already have your own horror stories related to stakeholder misalignment, but you don't know that the outcomes can result in significant loss of trust and opportunity costs. To help avoid this, Irina put together a game plan. Here’s the TL;DR:

To ensure alignment for data projects (and collaboration in general), there are three questions that stakeholders and data professionals alike should aim to answer:

  1. What do you need to be on the same page? Context about problem, ideal outcome, deadlines, check-in cadence, etc.
  2. How do you know if you’re on the same page? Simply put, stakeholders are not surprised by the project plan or results!
  3. How do you actually try to get on the same page? Continuously check to make sure you’re providing all the needed context around the direction and progress of the work. Updates on blockers, managing additional requests, losing site of end goal, etc.

I/Irina believe that stakeholder management is more difficult than the actual data project. If you take away some of the best-practices she outlines, you’ll collaborate more efficiently with your stakeholders and deliver higher quality projects.

Where to go from here

I strongly encourage you to understand the responsibilities or your stakeholders and to make sure they understand yours. Be empathetic and remember: We’re all humans trying to make the best decisions with the data available to us. If a mutual understanding and respect isn’t present, moving to a new process will only make your collaboration improve slightly at best. Only if understanding and respect is mutual should you move forward with the “how to” collaboration suggestions in this blog. I hope it helps you collaborate better in this modern data stack world! If you have any questions, or if you’d like to discuss this topic more, you can find me (and the folks’ whose work I reference) in The OA Club, as well as via Twitter, LinkedIn, and email.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: