Interviews

How Prolific’s own Jim Lumsden supercharged their sales team with product usage data | Census

Roslyn Coutinho
Roslyn Coutinho December 12, 2022

Roz doubles as a product marketing manager and brand marketing manager at Census. She's a designer turned product manager, driven by curiosity and a constant desire to learn. San Francisco, California, United States

This debut edition of Select Stars series was updated on December 12 to expand on Prolific’s use case. This Census series of profiles highlights data practitioners driving more impact with Operational Analytics.

Jim Lumsden is the data lead at Prolific, a two-sided marketplace that helps researchers of all kinds conduct online research. In this interview, Jim shares how he became the first data hire at Prolific, and how he drove more impact with the company’s brand new sales team by syncing key product analytics to Hubspot with the modern data stack, including Census, Snowplow, and Redshift.

What led you to a career in data? 

I completed a PhD in online research methods at the University of Bristol, tackling the question: “How can we make data collection better?” During my PhD I was actually a Prolific customer – I was running lots of online studies, looking at what type of people are taking part in research on the internet, specifically uncovering why they were doing it and how we could incentivize them. By the end of my PhD, it was quite natural to step into a role at Prolific. When I joined, the company was only eight people, and as the first data hire I don't think I really knew what I was doing; I just figured it out as we went along.

What is your role at Prolific, and what are the responsibilities of the data team?

I've been through many job titles in my time at Prolific. Over the four years that I've been here, I've been everything from a data analyst to a growth analyst, and everything in between (including cold calling people in sales and marketing – that was an interesting time!). After we became part of the YC batch in 2019, I started settling into the Data Lead role. Since then, I've spent most of my time building our hard data stack and hiring a data team to serve our product needs, all while supporting our business intelligence and financial reporting divisions.

The data team manages all-things data at Prolific. So, when we break it down, we have three main functions: managing the infrastructure, providing ad-hoc analytics support, and building data products. Even though we’ve separated our duties into “functions”, there's a lot of blurriness between those different functions – especially in our small team of six.

How would you describe the mission of Prolific’s data team?

If there's one strategy that we live by, it's this: Self-serve access to data is crucial for decision making. With a small team like ours, there's no feasible way that we can tend to the constant demand across the company, at least not at the level of data-drivenness that we want to see. 

To compensate, we've set up tools like Metabase that give people hands-on access to data, in conjunction with tools like dbt to transform the data in our data warehouse – which are intuitive and easy for anyone to work with. With that setup, we can serve 20% of requests deeply, and the other 80% can be self-served, giving customers the freedom to make those decisions without feeling like they’ll be slowed down waiting for someone to resolve their request. 

Prolific’s operational analytics stack

Next up, we asked Jim to deep dive on an impactful operational analytics use case with us.

Show & Tell: Getting key user product metrics and actions in Hubspot for Sales

Reverse ETL became a core tool for our team in the last couple of months, especially since we started setting up our sales team with HubSpot as our CRM. We wanted to send a summary of customer data over to HubSpot so that our salespeople could make informed decisions and interact with potential or actual customers at the opportune moment. 

Snowplow was really the snowball that got our whole data stack rolling back in 2020 when we had an aspirational data dream, a smaller team, and the early stages of our infrastructure. From there, we started to do transformations in Redshift and dbt, and then were able to send the results of those transformations via Census to HubSpot. The end result is basically like having a friend or consultant with all the data you could need standing over your shoulder. 

Many of our customers are affiliated with universities and, as you can imagine, there are dozens, if not hundreds of researchers that might want to use Prolific. Previously, we were treating all of these researchers as individuals, but when their data lands in HubSpot, we want to be able to pull them together and group them as part of an organization.

We already had the data models, but what we were lacking was the ability to sync the majority of fields in those data models over to HubSpot. So, we built new data models that aggregated data on researchers at both the contact and company levels, combining first and last key user actions

You can think about first actions and how long it takes for someone to display some purchase intent or buy their first item. These are good indications of if this is likely to be someone who is going to convert to a high value. Are they going to convert fast? Do they need much support? 

To give us the whole picture, we also want the timestamps of their last action to trigger sales workflows in HubSpot. For instance, if someone’s last login is over two weeks ago, we can trigger a workflow that will prompt a salesperson to reach out each morning or just review that account to check that everything is healthy.

Example of how product usage metrics appear at a glance in the company object in Hubspot for the Sales team

Along with the first and last user actions, we've got aggregate numbers that tell us about the health of the account. How much money have they made us, or how much have they added to their wallet? How many studies have they conducted on Prolific? What country are they based in? 

Basically, we want to give our young sales team as much context as possible about who they're interacting with. They need to be able to pinpoint a customer and go, “Okay, this person has literally used us hundreds of times.” That’s going to change the way that they talk to them, how they interact, their use of language, and the way they deal with different concepts. In our case, this is all really important because we're a seven-year-old company with a three-month-old sales team.

What impact did this use case have on your business?

Since integrating this workflow, the level of visibility we now have over customers’ interactions with our platform is huge. Everything in a product comes down to how well you understand the customer's psychology and, unless you understand their thinking and their behavior, you're flying blind.

Although that data has always been accessible to our salespeople, it's usually stored in another app or another tab. Now, it's right there in HubSpot, ready to automatically trigger workflows, so we can use it to strike at the opportune moment. We can see if someone has high purchase intent, and we can be there with an email saying, “Hey, I saw you were doing this on the platform yesterday. Is there anything I can do to help you get it over the line?” That kind of time-sensitive action is what makes the Hubspot plus data warehouse team work so well.

We've closed more than 10 new, big logos in the short time that we’ve been up and running with this data in HubSpot. For us, the added visibility has allowed us to unlock the power of our sales team; they’re no longer working blind. Every morning they know what their customers are doing because the data lands in HubSpot overnight.

Smoothing out our data pipelines and making reusable, scalable self-serve a reality has freed up the team to focus on new data growth frontiers to help support the business better. 

Building out these first initial use cases has also fueled even bigger data dreams for 2023 and beyond, including thinking of ways we can dive deeper into the behavioral data of our researches to create Customer Behavioral Profiles for churn prediction, lifespan analysis, and reengagement opportunities. That’d be incredibly powerful since, especially with sales and customer success, timing is everything. 

What advice would you give to other small data teams trying to prioritize their work?

First: Get everyone on board with the importance of building Customer Behavioral Profiles early on. It doesn’t matter what team you’re on, the entire business is built around customers and the data they provide.  The only way to boost engagement with those customers is by serving contextual personalized experiences based on the descriptive richness of their behavioral data -  how they act on your site and in  your product, for how long and in what patterns.

Second: I think there's a tendency within small data teams to try to solve every single problem at once. The fact is: Building out a good data stack takes some time, but the advantage of doing it is that it saves you so much time in the future. There's a reason that tools like dbt are considered “best-in-class” – they take your raw, messy data, transform it, and make it easy to send to other places without messy connections or unmaintainable infrastructure. You need to get a data warehouse, put dbt on top of it, and start doing very basic visualization for end users.

Follow that up by going deep with a specific team. Whether that involves working with the sales team to set up the day-to-day needs to do successful sales, or working with your finance team to do water-tight monthly reconciliation, there's going to be some area in your business that sorely needs data. You can't fight all the fires at once, so pick a specific area, work really closely with those stakeholders, understand that problem space, and build data products that serve their needs.

The knowledge you gain and the infrastructure you put in place along the way will serve you across other areas of the business – focus on delivering value in one specific area rather than drowning in the demands of data in the small business.

You don’t notice each day that you’re getting more and more self-sufficient and flexible. But what we can do now with our full stack (and with all these tools in combination) was just unimaginable in 2020. 

✨ Learn more about how Prolific use Census for Operational Analytics and deep dive into how they chose their stack in our webinar with Jim. 👇

🤩 Want to be featured in our next Select Stars profile? Apply here and we’ll be in touch soon!

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: