Census cheat sheets

Email Segmentation

Email Segmentation and Scoring SQL Cheat sheet

Email segmentation is among the most common types of customer analysis, especially for B2B. It gives us insight into two things: how each individual customer is doing (an email is a unique identifier for the human owner of that email), and how different businesses or “accounts” are doing (email domains are a rough but identifier of a business.

This cheatsheet is helpful for analysis but it’s also meant to help you think about how you could apply some simple SQL analytics to operational tools like Hubspot, Salesforce, and Zendesk. We call this Operational Analytics.

Time to open up your SQL editor and pull up a table has one row for each email contact. Keep in mind this SQL is based on Snowflake you can easily modify this to the SQL flavor of your choice.

1. Decomposing email addresses into parts

Let's start simple. Here's how to break down an email address into each of its three parts: the name, domain name and top level domain (TLD.)

Why would you use these parts? Here are a few reasons:

  • The name without the domain can be used as a default user name suggestion and provide clues for the email owner’s actual name.
  • The domain name can be used to group users into accounts and give pretty good estimates of how many users from a domain your app might have.
  • The TLD can be a good indicator of the quality, or lack thereof, of a given lead. Think about it, .edu TLDs most often says its a student playing around while a .io domain signals that it could be a lead from a cool new startup.
SELECT PROPERTY_EMAIL,
 split_part(PROPERTY_EMAIL, '@', 1) username,
 split_part(PROPERTY_EMAIL, '@', - 1) domain_name,
 split_part(PROPERTY_EMAIL, '.', - 1) top_level_domain
FROM FIVETRAN.HUBSPOT.CONTACT

2. Differentiate between personal and professional email accounts

This is admittedly an imperfect rule of thumb but it does cover much more than 90% personal email addresses, especially within North America.

This case expression checks against Gmail, of course, which accounts for +50% personal email accounts in the US and the biggest email providers around the world such as Hotmail, Apple, Microsoft, Yandex, Sina, QQ, and NetEase (CN).

SELECT CASE 
  WHEN split_part(PROPERTY_EMAIL, '@', - 1) ilike ANY (
    'gmail.%',
    'yahoo.%',
    'hotmail.%',
    'aol.%',
    'outlook.%',
    'msn.%',
    'yandex.%',
    'mail.ru',
    'icloud.%',
    'live.%',
    '163.%',
    'qq.%',
    'sina.%'
    )
   THEN 'PERSONAL'
  ELSE 'BUSINESS'
  END personal_or_busines
FROM FIVETRAN.HUBSPOT.CONTACT

3. Order emails within a domain by their creation date

The rank window function allows you to order a group of rows by an ordering field. In this case, the “property_created” field represents when a contact was added to Hubspot (but this could be when they signed up for your app etc.)

This will return a 1 for the first person from a given email domain and 2,3,4 etc for the second, third, fourth, etc depending on how many contacts there are for a given domain.

Keep in mind that this field will be a lot more useful for professional emails compared to personal emails.

SELECT PROPERTY_EMAIL,
 rank() OVER (
  PARTITION BY split_part(PROPERTY_EMAIL, '@', - 1) ORDER BY PROPERTY_CREATEDATE ASC
  ) domain_email_rank
FROM FIVETRAN.HUBSPOT.CONTACT

Ready to send this data to your CRM?

Sync data to 10 destination fields from your data warehouse across unlimited destinations*. Start operationalizing your data in less time than it takes to write a Jira ticket.

4. Bucket trials by “noise,” “first user,” and “repeat users”

This bucketing logic brings together all the ideas above to create a remarkably powerful yet simple segmentation for B2B PLG apps. The case statement buckets users into “PERSONAL”, “DOMAIN FIRST USER”, and “DOMAIN REPEAT USER.”

Put this into action as a prioritization method for sales and support teams. First check the conversion rate from each of these buckets and decide how you want to engage with them. Often a sign up from a personal email address is not likely to convert and probably not worth your sales or support teams’ time. A first user from a domain might deserve a little more attention from support to help them get up and running and repeat users might signal that this account should start paying!

SELECT PROPERTY_EMAIL,
 CASE 
  WHEN split_part(PROPERTY_EMAIL, '@', - 1) ilike ANY (
    'gmail.%',
    'yahoo.%',
    'hotmail.%',
    'aol.%',
    'outlook.%',
    'msn.%',
    'yandex.%',
    'mail.ru',
    'icloud.%',
    'live.%',
    '163.%',
    'qq.%',
    'sina.%'
    )
   THEN 'PERSONAL'
  WHEN rank() OVER (
    PARTITION BY split_part(PROPERTY_EMAIL, '@', - 1) ORDER BY PROPERTY_CREATEDATE ASC
    ) = 1
   THEN 'DOMAIN FIRST USER'
  WHEN rank() OVER (
    PARTITION BY split_part(PROPERTY_EMAIL, '@', - 1) ORDER BY PROPERTY_CREATEDATE ASC
    ) > 1
   THEN 'DOMAIN REPEAT USER'
  END email_type
FROM FIVETRAN.HUBSPOT.CONTACT

5. Get the first email address from each email's domain

Sometimes it’s nice to know who brought an account to your app. This person is often the person who will end up championing your product for their business. With this property, you can push this identifier (or the actual contact ID) to a contact record in your CRM to help sales and support people quickly draw associations with an account user’s champion. This will help everybody think about account usage rather than user usage.

SELECT PROPERTY_EMAIL,
 first_value(PROPERTY_EMAIL) OVER (
  PARTITION BY split_part(PROPERTY_EMAIL, '@', - 1) ORDER BY PROPERTY_CREATEDATE ASC
  ) domain_first_email
FROM FIVETRAN.HUBSPOT.CONTACT

6. Quickly search Linkedin for a given contact

“80% of the time it works all the time.” But when it works it saves a lot of time.This transformation creates a search query to paste into Google to find what is likely the contacts first and last name and their domain name. Quite often, this is enough information to provide to Google to retrieve the contact’s Linkedin profile. From there, you can learn a lot about the person and then manually score or bucket them, or develop more personalized engagements for them.

Try it for yourself: site:linkdedin.com firstname lastname company.com

SELECT PROPERTY_EMAIL,
 concat_ws(' ', 'site:linkedin.com', regexp_replace(split_part(PROPERTY_EMAIL, '@', 1), '\\.', ' '), split_part(PROPERTY_EMAIL, '@', - 1)) linkedin_search
FROM FIVETRAN.HUBSPOT.CONTACT

Now that you have six new ways to segment your customers, users, or leads, what more can you do? The obvious answer is to sum the distinct number of users in each bucket, or you can go further and calculate metrics like customer lifetime value, lead scores, or days since last visit.
After that?

Time to try Census and start syncing your newly discovered dimensions to your operational systems for free!

From your data warehouse to your favorite apps. Without code or engineers.

Start using Census today by booking a demo with one of our experts.

Illustration of a data stack from multiple data sources.