Tutorials

Snowflake SQL API tutorial with authentication and query examples | Census

Stephen Allwright
Stephen Allwright September 14, 2022

Stephen is a data science leader based in Oslo, Norway. After making the self-taught career transition from finance to data himself, he now enjoys helping others do the same by writing simple-to-understand tutorials that are accessible for all skill levels. If you’d like to read more of Stephen’s writing, you can find it at https://stephenallwright.com/. Oslo, Norway

Snowflake is already one of the most popular data warehouses for analytics, but it’s becoming increasingly more capable of supporting operational use cases as well. Now, they’re making it possible to access your data from Snowflake easily within your applications which, in turn, significantly boosts the value your data creates. This is great news for anybody who has recently looked at their Snowflake bills! 🤑

In order to put your Snowflake data to work in your applications, you’ll need a fluid and efficient way of accessing it. That’s where the Snowflake API comes in. 

What is the Snowflake SQL API

The Snowflake SQL API is Snowflake’s own REST API that allows you to execute SQL queries in your instance, check or stop these queries, and fetch the resulting data. It’s not just limited to standard select queries, either; you can also make adjustments to existing tables and views. Now on version 2, this easy-to-use API simplifies and streamlines the process of using Snowflake directly in applications, putting endless SQL query opportunities at your fingertips. ✋

In this article, I will explain how the API is structured and walk through an example use case from start to finish.

Integrating the Snowflake SQL API yourself

A friendly disclaimer from all of us at Census: Integrating this API into your application is relatively simple, but remember that all code becomes a technical debt that needs to be kept up-to-date and closely monitored. This takes vital time away from creating value using the very data you are accessing. ⏱️

As an alternative, Census can save you time and money – but we’ll cover that a bit later on. For now, let’s dive in. 🤿

Step-by-step guide for integrating the Snowflake SQL API

The example we will be working through in this tutorial is a typical use case for this API. We’ll start by fetching data from Snowflake for usage in a CRM tool, where it can be used to improve the personalization of your customer communication.

What you need to get started

To be able to work with the API in this tutorial, you'll need to have the following ready:

  1. A Snowflake account 
  2. SnowSQL 1.2.17 or higher
  3. Key-Pair authentication to your Snowflake instance

How the Snowflake SQL API is structured

Let’s get started by looking at how the Snowflake SQL API is structured. The API is made up of one resource, /api/v2/statements/, and three endpoints which are:

  1. /api/v2/statements: For submitting a SQL statement for execution
  2. /api/v2/statements/<statement_handle>: For checking the status of a statement
  3. /api/v2/statements/<statement_handle>/cancel: For canceling a statement

Authentication for connection to Snowflake

Before we work with our Snowflake instance we need to authenticate ourselves which can be done in one of two ways: OAuth or JSON Web Token (JWT). In this tutorial, we will be using JWT. 

Assuming that we’ve already set up key-pair authentication with our Snowflake instance, we need to generate a time-limited JWT token using our private key:

$ snowsql -a <account> -u <user> --private-key-path <path to private key> --generate-jwt 

The resulting JWT from this request will be used in the header to authenticate each API request. Below is an example of how the header should look using the JWT.

POST /api/v2/statements HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

This is the header that will be used for all requests via the API, and it's made up of 3 key elements:

  • Authorization. Where the generated JWT token will be stored to authenticate the request
  • X-Snowflake-Authorization-Token-Type. Defines the type of authentication that is being used. KEYPAIR_JWT is entered when using JWT. This field, however, would not be required if using OAUTH
  • User-Agent. Defines the name and version of your application

Request customer data from Snowflake using the SQL API

So far we’ve managed to authenticate our instance and create the header which will be used for our requests. Now, let’s use what we’ve created to start fetching the data.

In order to get the customer data we want for our CRM, we need to submit a request to execute the SQL command. To do this, we’ll combine our request header with a body containing our desired SQL command and submit this to the /api/v2/statements endpoint.

POST /api/v2/statements HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT
{
"statement": "select user_id, age, city from customers",
"timeout": 60,
"database": "production",
"schema": "analytics",
"warehouse": "small",
"role": "developer"
}

Here, we have combined the header we created earlier with a request body containing all the information needed to undertake the SQL command. The fields in the request body are:

  • Statement: The SQL command we want to be run
  • Timeout: The number of seconds before the statement automatically times out
  • Database, schema, warehouse: Where the command should be run
  • Role: What role will be used to run this query

Once we’ve submitted a successful POST request, a QueryStatus object is returned, containing the metadata required to fetch the data when the query is completed.

{
"code": "090001",
"sqlState": "00000",
"message": "successfully executed",
"statementHandle": <statement_handle>,
"statementStatusUrl": "/api/v2/statements/<statement_handle>"
}

Using variables to customize your Snowflake API request

You can also add variables to the SQL command. To denote a variable in the SQL command, add a ? in the desired location, then define this variable, including its value and data type, by using the bindings JSON object.

In our example, we could add a variable to denote the type of customers that we want to fetch into our CRM, which would look something like this:

…
{
    "statement": "select user_id, age, city from customers where customer_type = ?",
    "timeout": 60,
    "database": "production",
    "schema": "analytics",
    "warehouse": "small",
    "role": "developer",
    "bindings": {
        "1": {
            "type": "TEXT",
            "value": "B2C"
        }
    }
}

Reading customer data from Snowflake into your application

At this point, we’ve constructed our request and started running the SQL command in our Snowflake instance, but how do we get the data from this running request? 🤔

Depending on the time it takes to run the query, this data will either be returned immediately from the initial POST request or after submitting a GET request. Here’s how to submit a GET request. 👇

Using the <statement_handle> found within the QueryStatus from the initial POST request, we can submit the GET request to return the first partition of data.

GET /api/v2/statements/<statement_handle> HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

A ResultSet object will be returned if the SQL command was successfully completed. 

This ResultSet contains metadata about the returning data as well as the first partition of data. For our example, it could look something like this:

{
  "code": "090001",
  "statementStatusUrl": "/api/v2/statements/<statement_handle>",
  "sqlState": "00000",
  "statementHandle": <statement_handle>,
  "message": "Statement executed successfully.",
  "createdOn": 1645742998434,
  "resultSetMetaData": {
      "rowType": [
          {
              "name": "USER_ID",
              "type": "TEXT",
              "length": 1024,
              "precision": 0,
              "scale": 0,
              "nullable": false
          },
          {
              "name": "AGE",
              "type": "TEXT",
              "length": 1024,
              "precision": 0,
              "scale": 0,
              "nullable": false
          },
          {
              "name": "CITY",
              "type": "TEXT",
              "length": 1024,
              "precision": 0,
              "scale": 0,
              "nullable": false
          }
      ],
      "numRows": 1000,
      "format": "jsonv2",
      "partitionInfo": [
          {
              "rowCount": 100,
              "uncompressedSize": 1000,
              "compressedSize": 500
          },
          {
              "rowCount": 100,
              "uncompressedSize": 1000,
              "compressedSize": 500
          },
          {
              "rowCount": 100,
              "uncompressedSize": 1000,
              "compressedSize": 500
          }
      ]
  },
  "data": [
      [
          "334240","29","Oslo"
      ],
      [
          "334241","45","Stockholm"
      ],
      [
          "334242","21","London"
      ],
      [
          "334243","40","Paris"
      ],
      [
          "334244","34","Berlin"
      ]
  ]
}

Since that's a lot to take in, let’s break this returning object down into its 3 primary areas:

  1. Header information. The first few objects are metadata related to the statement that was run (such as its current status)
  2. resultsSetMetaData. Contains metadata about the returned data
    1. rowType. Metadata about each column such as the column name, data type, length, etc
    2. partitionInfo. The extra data partitions we need to fetch for the full dataset
  3. data. The first partition of data returned by the query

Notice that all of the data has been returned as strings. This will happen for each column regardless of data type and will need to be converted to the required data type after they’ve been returned.

The initial partition of data has been returned in this ResultSet, but there are three extra partitions of data as defined by the partitionInfo object. These can be fetched into our application by using additional GET requests to the api/v2/statements/<statement_handle> endpoint with the partition argument.

GET /api/v2/statements/<statement_handle>
?partition=1 HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

What we’ve created with the Snowflake SQL API

And there you have it – we now have access to our B2C customer data within our CRM, so we can use it to personalize our customer communications. 🤩

To summarize, this is the process we went through to get this data in our application:

  1. Authenticated our connection to Snowflake using JWT
  2. Requested that a SQL command be run in Snowflake using a POST request to the  /api/v2/statements/ endpoint
  3. Added variables to our SQL command to improve the usability
  4. Fetched the data in multiple partitions using a GET request to the /api/v2/statements/<statement_handle> endpoint

If you wanted to take this example even further, you could introduce OAuth instead of JWT, resulting in a smoother developer experience.

Try the efficient, simpler, and safer alternative, Census

It’s clear from this example that the Snowflake SQL API is simple to get up and running initially. But once you start the process of introducing this to a large application, it will become increasingly time-consuming, especially when considering the maintenance that would be involved. Given this hefty price tag, both in terms of time and opportunity cost, consider using Census instead.

Census is the leading reverse ETL tool available and will allow you to move data from Snowflake into all of your tools without needing to spend time setting up and maintaining these integrations. Instead, you can focus your time and resources on what matters: Getting value from and taking action on your data. 💪

✨ Want to start syncing your data? Book a demo with one of our product specialists to learn how we work for your specific operational analytics needs.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: