Marketing attribution: The challenges of current software solutions | Census

Sylvain Giuliani
Sylvain Giuliani May 09, 2023

Syl is the Head of Growth & Operations at Census. He's a revenue leader and mentor with a decade of experience building go-to-market strategies for developer tools. San Francisco, California, United States

Ah… Marketing attribution — the key to optimizing marketing strategies and improving marketing spend ROI. 😌

Sounds perfect, right?

There’s just one, teensy problem: Current marketing attribution software solutions have marketers feeling unsatisfied at best. In fact, according to a survey by QueryClick, 58% of marketers say their current attribution models are an obstacle to implementing marketing activities. And only 11% of marketers are happy with their attribution solutions. 😬 While these existing software solutions do provide some valuable insights, they fall short of meeting today’s dynamic marketing needs.

In this blog post, we'll explore how to overcome the limitations of current marketing attribution software solutions by modeling marketing data in plain SQL.

The limitations of current marketing attribution software solutions

Like we said: Most marketers are dissatisfied with the current marketing attribution software solutions.

One of their main limitations? They’re often too rigid and inflexible. Existing solutions rely on predefined rules and models to assign credit to each touchpoint, so they (typically) lead to inaccurate and incomplete results. They simply don’t provide enough transparency into how the credit is assigned, making it difficult for marketers to understand and interpret the results.

Another roadblock is that current marketing attribution solutions don’t integrate very well with other marketing tools and platforms. 🚧 Among other things, that disconnect makes it difficult for marketers to bring together data from different sources and gain a comprehensive view of their marketing efforts.

So… if marketing attribution — when done correctly — improves marketing campaigns and ROI, but current solutions fail to meet expectations, where do you turn? 👂 Pssst… You can model your marketing data in plain SQL.

Why use SQL for marketing attribution?

SQL is a powerful tool for uniting data from multiple sources and applying sophisticated algorithms to assign credit to each touchpoint.

And using it for marketing attribution has several advantages over traditional marketing attribution software solutions.👇

  1. SQL provides more flexibility and control over the attribution process. You can define your own attribution rules and models, and easily modify or adjust them as needed.
  2. SQL provides greater transparency into the attribution process. Now, you can see how credit is assigned to each touchpoint, so you can understand the underlying logic behind the attribution rules and models.
  3. SQL can help us integrate data from multiple sources and gain a comprehensive view of our marketing efforts. By joining your data from web analytics, CRMs, and marketing automation, you can analyze the effectiveness of different marketing channels and campaigns.

All about attribution modeling

Before we dive into the details of how to model marketing data in SQL, let's review the different types of attribution modeling:

  • First touch: Assigns all the credit to the first touchpoint that led to a conversion.
  • Last touch: Assigns all the credit to the last touchpoint that led to a conversion.
  • Linear: Assigns equal credit to all touchpoints that led to a conversion.
  • U shape: Assigns more credit to the first and last touchpoints, and less credit to the touchpoints in between.
  • Time decay: Assigns more credit to the touchpoints that occurred closer in time to the conversion.

Of course, each attribution model has its own strengths and weaknesses, so the "best” model will depend on the specific needs and goals of your organization.

Building an attribution model in SQL

Let’s cut to the chase: To build an attribution model in SQL, you need to bring together data from multiple sources, including web analytics, CRM, and marketing automation. You also need to define your own attribution rules and models in order to apply sophisticated algorithms to assign credit to each touchpoint.

Here’s a basic table representation of the data schema involved:


.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
 overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
 font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-cly1{text-align:left;vertical-align:middle}
.tg .tg-1wig{font-weight:bold;text-align:left;vertical-align:top}



 
   Table
   Fields
 


 
   sessions
   session_id, user_id, start_time, end_time, source_channel
 
 
   touches
   touch_id, user_id, timestamp, channel, campaign
 
 
   conversions
   conversion_id, user_id, conversion_time, revenue
 


In this example, we’ll use the schema in this table 👆 to build a view that assigns credit to each touchpoint based on our set of predefined rules. Then, once we analyze the effectiveness of different marketing channels and campaigns, we can optimize our marketing strategy and improve the ROI of our marketing spend.

Let’s write some SQL

Building a query takes four basic steps:

  1. Gather your required data sources
  2. Find all sessions before conversion
  3. Calculate the total sessions and the session index
  4. Allocate points

Here’s what each step looks like using actual SQL queries.

1. Gather your required data sources

The following snippet creates three CTEs for our three data sources — session, touch (can be any events), and conversion data — from their respective tables. We’re just prepping the data at this point, but this is what it looks like.

WITH sessions AS (
 SELECT
   session_id,
   user_id,
   start_time,
   end_time,
   source_channel
 FROM
   sessions_table
),

touches AS (
 SELECT
   touch_id,
   user_id,
   timestamp,
   channel,
   campaign
 FROM
   touches_table
),

conversions AS (
 SELECT
   conversion_id,
   user_id,
   conversion_time,
   revenue
 FROM
   conversions_table
)


2. Find all sessions before conversion

Now, we have our source data, so let’s get crunching. Start by creating a view that combines session and conversion data. The goal is to identify all sessions that occurred before a conversion, then join the session and conversion data together. Here’s how the query looks:

WITH sessions_before_conversion AS (
 SELECT
   s.session_id,
   s.user_id,
   s.start_time,
   s.end_time,
   s.source_channel,
   c.conversion_time
 FROM
   sessions s
   JOIN conversions c ON s.user_id = c.user_id AND s.start_time < c.conversion_time
)


3. Calculate the total sessions and the session index

With the session data in hand, we can assign the weight to each of these sessions based on the number of touches using the following query:

WITH touches_with_session AS (
 SELECT
   t.touch_id,
   t.user_id,
   t.timestamp,
   t.channel,
   t.campaign,
   s.session_id
 FROM
   touches t
   JOIN sessions_before_conversion s ON t.user_id = s.user_id AND t.timestamp BETWEEN s.start_time AND s.end_time
),

touchpoints AS (
 SELECT
   user_id,
   session_id,
   channel,
   campaign,
   COUNT(DISTINCT touch_id) AS touches
 FROM
   touches_with_session
 GROUP BY
   user_id,
   session_id,
   channel,
   campaign
),

touchpoint_weights AS (
 SELECT
   *,
   ROW_NUMBER() OVER (
     PARTITION BY user_id, session_id
     ORDER BY touches DESC
   ) AS touchpoint_rank
 FROM
   touchpoints
)


4. Allocate points

Allocate points to each touchpoint based on the model you want. Here, we are assigning 100% of the credit to the first touchpoint and only 50% of the credit to the second touchpoint with the following query.

WITH attribution AS (
 SELECT
   user_id,
   session_id,
   channel,
   campaign,
   touches,
   CASE touchpoint_rank
     WHEN 1 THEN 1
     WHEN 2 THEN 0.5
     ELSE 0
   END AS attribution_weight
 FROM
   touchpoint_weights
 WHERE
   touchpoint_rank <= 2
)


5. Join in revenue and ad spend data

Use the allocated credit to analyze the effectiveness of different marketing channels and campaigns based on the revenue they generated plus the cost of your ads. The final output calculates the ROI for each marketing channel and campaign based on the attributed credit and revenue data.

WITH attribution AS (
 ...
),

attribution_by_channel_campaign AS (
 SELECT
   channel,
   campaign,
   SUM(attribution_weight) AS attribution
 FROM
   attribution
 GROUP BY
   channel,
   campaign
),

revenue_by_channel_campaign AS (
 SELECT
   channel,
   campaign,
   SUM(revenue) AS revenue
 FROM
   conversions
 GROUP BY
   channel,
   campaign
)

SELECT
 abc.channel,
 abc.campaign,
 abc.attribution,
 rbc.revenue,
 abc.attribution / rbc.revenue AS roi
FROM
 attribution_by_channel_campaign abc
 JOIN revenue_by_channel_campaign rbc ON abc.channel = rbc.channel AND abc.campaign = rbc.campaign


And when you combine the above five steps, you get an attribution model in SQL that assigns credit to each touchpoint based on a set of predefined rules. 🙌  Putting it all together, you get the full query.


WITH sessions_before_conversion AS (
 SELECT
   s.session_id,
   s.user_id,
   s.start_time,
   s.end_time,
   s.source_channel,
   c.conversion_time
 FROM
   sessions s
   JOIN conversions c ON s.user_id = c.user_id AND s.start_time < c.conversion_time
),

touches_with_session AS (
 SELECT
   t.touch_id,
   t.user_id,
   t.timestamp,
   t.channel,
   t.campaign,
   s.session_id
 FROM
   touches t
   JOIN sessions_before_conversion s ON t.user_id = s.user_id AND t.timestamp BETWEEN s.start_time AND s.end_time
),

touchpoints AS (
 SELECT
   user_id,
   session_id,
   channel,
   campaign,
   COUNT(DISTINCT touch_id) AS touches
 FROM
   touches_with_session
 GROUP BY
   user_id,
   session_id,
   channel,
   campaign
),

touchpoint_weights AS (
 SELECT
   *,
   ROW_NUMBER() OVER (
     PARTITION BY user_id, session_id
     ORDER BY touches DESC
   ) AS touchpoint_rank
 FROM
   touchpoints
),

attribution AS (
 SELECT
   user_id,
   session_id,
   channel,
   campaign,
   touches,
   CASE touchpoint_rank
     WHEN 1 THEN 1
     WHEN 2 THEN 0.5
     ELSE 0
   END AS attribution_weight
 FROM
   touchpoint_weights
 WHERE
   touchpoint_rank <= 2
),

attribution_by_channel_campaign AS (
 SELECT
   channel,
   campaign,
   SUM(attribution_weight) AS attribution
 FROM
   attribution
 GROUP BY
   channel,
   campaign
),

revenue_by_channel_campaign AS (
 SELECT
   channel,
   campaign,
   SUM(revenue) AS revenue
 FROM
   conversions
 GROUP BY
   channel,
   campaign
)

SELECT
 abc.channel,
 abc.campaign,
 abc.attribution,
 rbc.revenue,
 abc.attribution / rbc.revenue AS roi
FROM
 attribution_by_channel_campaign abc
 JOIN revenue_by_channel_campaign rbc ON abc.channel = rbc.channel AND abc.campaign = rbc.campaign


Limitations of SQL-based attribution modeling

While modeling marketing data in SQL can help us overcome the limitations of current marketing attribution software solutions, there are still some limitations to be aware of.

  1. Positional attribution is not a perfect representation of human decision-making. In reality, customers may be influenced by multiple touchpoints, so their decision-making process may be more complex than can be captured by a simple attribution model.
  2. The quality of the source data is often low. Web traffic is often poorly tagged with UTM parameters, and CRM data may be incomplete or inaccurate. Poor source data quality can make assigning credit to each touchpoint difficult.

Despite these few limitations, modeling marketing data in SQL helps you gain deeper insights into the effectiveness of our marketing campaigns. And by experimenting with different attribution models and analyzing the results, you can continuously improve marketing attribution and drive better business outcomes. 🚀

The future of marketing attribution is SQL

Sure, marketing attribution is a game-changer for marketers. But if (and when) current software solutions fall short of meeting your evolving needs, start modeling your marketing data in plain SQL — and gain deeper insights into the effectiveness of your marketing campaigns along the way.

Remember: The SQL code example we ran through is just one way to model marketing data. There are so many other approaches you can use to gain deeper insights into the effectiveness of your marketing campaigns and improve ROI. It all just comes down to the specific needs of your organization.

💪 Want to learn how Census can empower your marketing attribution mission? Book a demo with a Census product specialist.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: