Best Practices

OA Digest Year in Review: How our favorite community content stacked up in 2022 | Census

Parker Rogers
Parker Rogers December 20, 2022

Parker is a data community advocate at Census with a background in data analytics. He's interested in finding the best and most efficient ways to make use of data, and help other data folks in the community grow their careers. Salt Lake City, Utah, United States

Throughout 2022, I shared 125 resources in the OA Digest Newsletter. 🤯 These resources came in various forms (blogs, videos, podcasts, events, etc.), but they came together to serve a single purpose: Help data professionals gain practical knowledge to level up their careers. 📈

To wrap up the year, (Please! Not another Spotify pun!) I wanted to highlight the 10 most popular resources from all of my newsletters.

OA Digest_2022_query

So, without further ado, thanks for being an OA Digest reader, and enjoy the top 10 (in no particular order). 👇 

Blogs and Technical Tutorials

SQL Cheat Sheet for Data Scienceby Srikanth Kb (data scientist @ Tesla)

Whether you’re a SQL veteran or a beginner, it’s always nice to have a SQL cheat sheet on hand. This blog is your ticket to remembering all the basic SQL syntax and semantics. Hopefully, it saves you a few google searches!  


The best SQL template for Customer Lifetime Value by Josh Berry (data scientist @ Rasgo)

You know what constantly reminds me that the world isn’t all that bad? People sharing high-value, applicable knowledge for free. Josh Berry does just that in his recent blog, The best SQL template for Customer Lifetime Value.

Josh has been building CLV models for over a decade. Not only does he share the templated SQL and jinja code, but he carefully explains the logic behind it all. Give this a read If you’re looking to up your SQL knowledge or gain a better understanding of CLV modeling. 


SQL Hack: How to get first value based on time without a JOIN or a Window Function? by Ahmed Elsamadisi (CEO @ Narrator AI)

There’s nothing worse than having to write long, complex queries to answer simple questions. Sure, sometimes it’s unavoidable, but other times you simply need to leverage some handy-dandy, pre-built SQL functions.

In this article, Ahmed shows you how to substitute subqueries and self-joins with simple SQL functions like CONCAT.

OA Digest 1022 - Cat Typing Frantically


Using SQL to Summarize A/B Experiment Results by Adam Stone (sr. analytics engineer @ Netlify)

Even if your responsibilities fall outside the realm of A/B testing, I highly recommend giving Adam’s blog a read. You can learn from his disciplined planning and execution process, as well as his high-quality SQL queries, then apply them to a wide variety of SQL projects. In this blog, Adam shares:

🪄 A practical example of A/B testing

🪄 Quality SQL queries for joining, de-duping, and safeguarding user-level tables

🪄 How to accurately calculate summary metrics accurately (hint: double-check your denominators 😉)

OA Digest 1122 - Using SQL to Summarize AB Experiment Results


Thinking of Analytics Tools as Products by Emily Thompson (lead - core product data science @ Gusto

Emily is a two-pronged professional, dabbling in both data and product. In this blog, she shares her best practices for developing self-service data tools for stakeholders (including two best practices I, personally, hadn't previously considered): 

  1. Create an internal marketing plan. This will help your stakeholders a) know that the tool exists and b) know how to use it. 
  2. Plan for customer support. Set up a process in order for stakeholders to easily contact you, so your team will solve their issues in a timely manner. 


5 SQL Functions Every Analytics Engineer Needs to Know by Madison Schott (analytics engineer @ Winc)

You can always count on at least one SQL tutorial in every OA Digest newsletter. You know those 50+ line SQL queries you used to write? But then, one day, "poof" 🪄 you learn about a function that solves it in 4? In this blog, Madison shares the common SQL functions she uses to solve complex problems.

Bonus: SQL code is included in each example so you can not only learn, but also apply them yourself in a snap. 👌


Videos and On-Demand Workshops

SQL Patterns Every Analyst Should Know w/ Ergest Xheblati (data architect & author)

This is the first installment of Ergest’s workshop series. In it, Ergest gets hands-on with SQL and shares key principles to help you take your SQL skill from intermediate to advanced. More specifically, he teaches:

🎯 Query composition patterns - How to make your complex queries shorter, more legible, and more performant

🎯 Query maintainability patterns - Constructing CTEs that can be reused. In software engineering, it's called the DRY principle (don't repeat yourself)

🎯 Query robustness patterns - Constructing queries that don't break when the underlying data changes in unpredictable ways

🎯 Query performance patterns - Make your queries faster (and cheaper) regardless of the specific database you’re using.

One of my many takeaways from the event... CTEs > Subqueries👇

SQL - subqueries vs CTEs


How to design a good tracking plan by Timo Dechau (founder & data designer @ Deepskydata)

If you work with event data (tracking, attribution, etc.), you’ve seen how disorganized, inaccurate, and useless it can be in the absence of a well-executed plan. Timo’s been building tracking plans for 10+ years, and he recently released this free 1-hour course on how to develop, manage, and implement a good tracking plan.

timo course
undefined

Bonus: Timo is somewhat of a flowchart expert. It’s fascinating. He visualizes the information as fast as he speaks it. 😂

data model@2x


SQL Puzzle Optimization: The UDTF Approach For A Decay Function by Felipe Hoffa (data cloud advocate @ Snowflake)

Felipe documents his response to a SQL challenge on Elon Musk’s private social platform: SQL Puzzle: Calculating engagement with a decay function. The goal? To write the most efficient decay function in SQL.

It’s particularly tedious because the decay calculation must reference multiple rows from the same table, and an inefficient query might crash your warehouse 💥. Some participants leveraged window functions, others used joins, and Felipe chose Tabular JavaScript UDFs.

If you come across a similar problem, Tabular JavaScript UDFs might be your best option!

Screen Shot 2022-05-04 at 11.35.16 AM


Data modeling workshop: How to design a lasting business blueprint w/ Ergest Xheblati (data architect & author)

Yes, another Ergest workshop! And if your data profession has anything to do with data modeling – or if you simply want a high-level overview from an industry expert like Ergest – I strongly recommend giving it a watch.

You’ll learn:

🎯 Data modeling 101: What is it? Why's it so important?

🎯 Top-down vs bottom-up data modeling: Which one is best for the current state of your business?

🎯 Modeling types: Conceptual models, logical models, physical models... What's the difference?

🎯 Overview + pros & cons of popular physical data modeling types (One Big Table, Kimball Dimensional Modeling AKA Star Schema, Relational Modeling, Activity Stream, etc.)


2022 was a great year for data practitioners. And as the industry continues to boom in 2023, we're sure to see even more great content from data folks!

Want to be the first to see resources like this in the new year? 🎆 Join The Operational Analytics Club!

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: