Tutorials

How to Conduct Time Series Forecasting with SQL (with Example)

Terence Shin
Terence Shin April 11, 2022

Terence is a data enthusiast and professional working with top Canadian tech companies. He received his MBA from Quantic School of Business and Technology and is currently pursuing a master's in computational analytics at Georgia Tech. Markham, Ontario, Canada

The future is here – for data in business, that is. As machine learning begins to take its rightful place at the forefront of data, so does predictive modeling’s prevalence in businesses. In fact, it’s almost essential for businesses to leverage predictive modeling in their operations to remain competitive in this landscape.

Why should you conduct times series forecasting?

By accurately forecasting what events will happen in the future, you can be prepared for anything that comes your way. These newly opened doors will reward you with several real-world advantages:

  1. It gives you more time to strategize and execute your plan. Giving yourself more planning time allows you to refine your strategy, and ultimately, reduces the chance of error once it’s been implemented. Your future self will thank you. 🙏
  2. It can help you with resource allocation. By predicting what’s trending upwards or downwards, you can allocate resources away from initiatives that are expected to have a lower ROI and move them into initiatives with higher ROIs. The bottom line? You can optimize your team’s time and set your projects up for success, so there’s more room for profits in your organization. 💰
  3. It allows you to move faster than your competitors. Do you feel the need for speed? By making informed predictions about what will occur in the future, you can reassess and pivot more iteratively, allowing you to stay agile and adaptive. And yes, you can crush your competition along the way. 💪

We’re familiar with the word “forecasting” when applied in the context of “weather forecasting.” Sure, we may not necessarily be interested in the weather, but it describes the same concept: the prediction of a future trend. Although there are several types of predictive forecasting models, we’re going to focus on the popular (and valuable) model known as time series forecasting. Typically, future outcomes are completely unavailable, but using this predictive model, they can be estimated through careful time series analysis, using algorithms and evidence-based priors.

What exactly is time series forecasting?

Rather than using outliers or categories to make predictions, time series forecasting describes predictions that are made with time-stamped historical data. Can you feel the power of time-based data?!

What exactly is time series forecasting

Ok, maybe that’s an exaggeration, but in business context, there are tons of applicable use cases. Examples of time series forecasting when applied to business can actually look like:

  • Predicting next month’s demand for a product to determine the amount of inventory you need.
  • Estimating the number of employees who are likely to leave the company next year so you can proactively develop a hiring plan that will satisfy the company’s upcoming needs.
  • Modeling the future of stock prices to determine which stock to add to the company’s portfolio.

TL;DR: the time component provides additional information that can be useful for predicting future outcomes. Adding time into the equation allows businesses to make predictions about what future events will be happening when so they can proactively plan and execute accordingly.

Depending on the complexity of your time series problem, there are a variety of techniques available to manipulate it, ranging from relatively simple visualization tools that show how trends behave to more complex machine learning models that utilize specific data set structures.

Similarly, depending on your goal, you can choose from several time series forecasting methods that have been widely adopted by the data world. For starters, there’s the autoregressive moving average model (ARMA), autoregressive integrated moving average model (ARIMA), Seasonal Autoregressive Integrated Moving-Average model (SARIMA)... and the list goes on. In this application, however, we’ll focus on the most fundamental application of stationary stochastic linear models, otherwise known as simple moving averages (SMAs).

Conducting simple time series forecasting in SQL

An SMA is a calculation that is used to forecast long-term trends using exponential smoothing; that is, taking the mean of a given metric over a specified period in the past. Super useful, right? Unfortunately, SMAs are not useful in predicting the exact future value of a metric from provided time series data. They can, however, still provide you with advantageous information based on past values.

To calculate SMAs in SQL, you’ll need two things to make up a time-series dataset:

  1. A date column (or other time step column)
  2. A column that represents a metric or a value that you want to forecast

For this tutorial, let’s say we’re interested in conversions, and we have the following test set showing the number of conversions that a company had each week from May 14, 2021 to August 20, 2021.

Conversions time series forecast

To calculate a 7 day SMA for conversions, we could use the following code:


SELECT
Date, Conversions,
AVG(Conversions) OVER (ORDER BY Date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as SMA
FROM daily_sales

This code would then result in the following table and chart:

Simple moving average time series forecasting
SMA forecasting

Pretty painless, right? Notice how volatile the organic conversion data (blue line) is in the chart above. In this simplified scenario, you could probably guess that conversions are trending upwards, but the raw data may not always be this intuitive for inferencing – hence the need for SMAs.

By looking at the 7-day SMA instead of the raw data:

  • We can get a clear picture of a metric’s general trend. For the given conversions in this period of interest, we can see that weekly conversions are trending upwards. 📈
  • We can reasonably predict where the SMA of the metric is heading. 🔮 If the most recent weekly conversion data values are above the current SMA, then the SMA should continue to trend upwards. Conversely, if conversions are consistently lower than the current SMA, we should expect it to begin to trend downward.

Voilà! Now you know how to build and interpret one of the most fundamental time-series models in SQL.

What’s next?

Want to stay on the SQL train? Check out these articles next: 👇

If you’re looking to expand your knowledge base beyond SQL, you’ll find a whole collection of how-tos from the Census team here.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: