The Official Census Blog
Check our product
TABLE OF CONTENTS
Get the best data & ops content (not just our post!) delivered straight to your inbox

Snowflake SQL API tutorial with authentication and query examples

Stephen is a data science leader based in Oslo, Norway. After making the self-taught career transition from finance to data himself, he now enjoys helping others do the same by writing simple-to-understand tutorials that are accessible for all skill levels. If you’d like to read more of Stephen’s writing, you can find it at https://stephenallwright.com/.

Stephen is a data science leader based in Oslo, Norway. After making the self-taught career transition from finance to data himself, he now enjoys helping others do the same by writing simple-to-understand tutorials that are accessible for all skill levels. If you’d like to read more of Stephen’s writing, you can find it at https://stephenallwright.com/.

Snowflake is already one of the most popular data warehouses for analytics, but it’s becoming increasingly more capable of supporting operational use cases as well. Now, they’re making it possible to access your data from Snowflake easily within your applications which, in turn, significantly boosts the value your data creates. This is great news for anybody who has recently looked at their Snowflake bills! 🤑

In order to put your Snowflake data to work in your applications, you’ll need a fluid and efficient way of accessing it. That’s where the Snowflake API comes in. 

What is the Snowflake SQL API

The Snowflake SQL API is Snowflake’s own REST API that allows you to execute SQL queries in your instance, check or stop these queries, and fetch the resulting data. It’s not just limited to standard select queries, either; you can also make adjustments to existing tables and views. Now on version 2, this easy-to-use API simplifies and streamlines the process of using Snowflake directly in applications, putting endless SQL query opportunities at your fingertips. ✋

In this article, I will explain how the API is structured and walk through an example use case from start to finish.

Integrating the Snowflake SQL API yourself

A friendly disclaimer from all of us at Census: Integrating this API into your application is relatively simple, but remember that all code becomes a technical debt that needs to be kept up-to-date and closely monitored. This takes vital time away from creating value using the very data you are accessing. ⏱️

As an alternative, Census can save you time and money – but we’ll cover that a bit later on. For now, let’s dive in. 🤿

Step-by-step guide for integrating the Snowflake SQL API

The example we will be working through in this tutorial is a typical use case for this API. We’ll start by fetching data from Snowflake for usage in a CRM tool, where it can be used to improve the personalization of your customer communication.

What you need to get started

To be able to work with the API in this tutorial, you'll need to have the following ready:

  1. A Snowflake account 
  2. SnowSQL 1.2.17 or higher
  3. Key-Pair authentication to your Snowflake instance

How the Snowflake SQL API is structured

Let’s get started by looking at how the Snowflake SQL API is structured. The API is made up of one resource, /api/v2/statements/, and three endpoints which are:

  1. /api/v2/statements: For submitting a SQL statement for execution
  2. /api/v2/statements/<statement_handle>: For checking the status of a statement
  3. /api/v2/statements/<statement_handle>/cancel: For canceling a statement

Authentication for connection to Snowflake

Before we work with our Snowflake instance we need to authenticate ourselves which can be done in one of two ways: OAuth or JSON Web Token (JWT). In this tutorial, we will be using JWT. 

Assuming that we’ve already set up key-pair authentication with our Snowflake instance, we need to generate a time-limited JWT token using our private key:

$ snowsql -a <account> -u <user> --private-key-path <path to private key> --generate-jwt 

The resulting JWT from this request will be used in the header to authenticate each API request. Below is an example of how the header should look using the JWT.

POST /api/v2/statements HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

This is the header that will be used for all requests via the API, and it's made up of 3 key elements:

  • Authorization. Where the generated JWT token will be stored to authenticate the request
  • X-Snowflake-Authorization-Token-Type. Defines the type of authentication that is being used. KEYPAIR_JWT is entered when using JWT. This field, however, would not be required if using OAUTH
  • User-Agent. Defines the name and version of your application

Request customer data from Snowflake using the SQL API

So far we’ve managed to authenticate our instance and create the header which will be used for our requests. Now, let’s use what we’ve created to start fetching the data.

In order to get the customer data we want for our CRM, we need to submit a request to execute the SQL command. To do this, we’ll combine our request header with a body containing our desired SQL command and submit this to the /api/v2/statements endpoint.

POST /api/v2/statements HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT
{
"statement": "select user_id, age, city from customers",
"timeout": 60,
"database": "production",
"schema": "analytics",
"warehouse": "small",
"role": "developer"
}

Here, we have combined the header we created earlier with a request body containing all the information needed to undertake the SQL command. The fields in the request body are:

  • Statement: The SQL command we want to be run
  • Timeout: The number of seconds before the statement automatically times out
  • Database, schema, warehouse: Where the command should be run
  • Role: What role will be used to run this query

Once we’ve submitted a successful POST request, a QueryStatus object is returned, containing the metadata required to fetch the data when the query is completed.

{
"code": "090001",
"sqlState": "00000",
"message": "successfully executed",
"statementHandle": <statement_handle>,
"statementStatusUrl": "/api/v2/statements/<statement_handle>"
}

Using variables to customize your Snowflake API request

You can also add variables to the SQL command. To denote a variable in the SQL command, add a ? in the desired location, then define this variable, including its value and data type, by using the bindings JSON object.

In our example, we could add a variable to denote the type of customers that we want to fetch into our CRM, which would look something like this:

…
{
    "statement": "select user_id, age, city from customers where customer_type = ?",
    "timeout": 60,
    "database": "production",
    "schema": "analytics",
    "warehouse": "small",
    "role": "developer",
    "bindings": {
        "1": {
            "type": "TEXT",
            "value": "B2C"
        }
    }
}

Reading customer data from Snowflake into your application

At this point, we’ve constructed our request and started running the SQL command in our Snowflake instance, but how do we get the data from this running request? 🤔

Depending on the time it takes to run the query, this data will either be returned immediately from the initial POST request or after submitting a GET request. Here’s how to submit a GET request. 👇

Using the <statement_handle> found within the QueryStatus from the initial POST request, we can submit the GET request to return the first partition of data.

GET /api/v2/statements/<statement_handle> HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

A ResultSet object will be returned if the SQL command was successfully completed. 

This ResultSet contains metadata about the returning data as well as the first partition of data. For our example, it could look something like this:

{
  "code": "090001",
  "statementStatusUrl": "/api/v2/statements/<statement_handle>",
  "sqlState": "00000",
  "statementHandle": <statement_handle>,
  "message": "Statement executed successfully.",
  "createdOn": 1645742998434,
  "resultSetMetaData": {
      "rowType": [
          {
              "name": "USER_ID",
              "type": "TEXT",
              "length": 1024,
              "precision": 0,
              "scale": 0,
              "nullable": false
          },
          {
              "name": "AGE",
              "type": "TEXT",
              "length": 1024,
              "precision": 0,
              "scale": 0,
              "nullable": false
          },
          {
              "name": "CITY",
              "type": "TEXT",
              "length": 1024,
              "precision": 0,
              "scale": 0,
              "nullable": false
          }
      ],
      "numRows": 1000,
      "format": "jsonv2",
      "partitionInfo": [
          {
              "rowCount": 100,
              "uncompressedSize": 1000,
              "compressedSize": 500
          },
          {
              "rowCount": 100,
              "uncompressedSize": 1000,
              "compressedSize": 500
          },
          {
              "rowCount": 100,
              "uncompressedSize": 1000,
              "compressedSize": 500
          }
      ]
  },
  "data": [
      [
          "334240","29","Oslo"
      ],
      [
          "334241","45","Stockholm"
      ],
      [
          "334242","21","London"
      ],
      [
          "334243","40","Paris"
      ],
      [
          "334244","34","Berlin"
      ]
  ]
}

Since that's a lot to take in, let’s break this returning object down into its 3 primary areas:

  1. Header information. The first few objects are metadata related to the statement that was run (such as its current status)
  2. resultsSetMetaData. Contains metadata about the returned data
    1. rowType. Metadata about each column such as the column name, data type, length, etc
    2. partitionInfo. The extra data partitions we need to fetch for the full dataset
  3. data. The first partition of data returned by the query

Notice that all of the data has been returned as strings. This will happen for each column regardless of data type and will need to be converted to the required data type after they’ve been returned.

The initial partition of data has been returned in this ResultSet, but there are three extra partitions of data as defined by the partitionInfo object. These can be fetched into our application by using additional GET requests to the api/v2/statements/<statement_handle> endpoint with the partition argument.

GET /api/v2/statements/<statement_handle>
?partition=1 HTTP/1.1
Authorization: Bearer <jwt_token>
Content-Type: application/json
Accept: application/json
User-Agent: <application_name>/<version_number>
X-Snowflake-Authorization-Token-Type: KEYPAIR_JWT

What we’ve created with the Snowflake SQL API

And there you have it – we now have access to our B2C customer data within our CRM, so we can use it to personalize our customer communications. 🤩

To summarize, this is the process we went through to get this data in our application:

  1. Authenticated our connection to Snowflake using JWT
  2. Requested that a SQL command be run in Snowflake using a POST request to the  /api/v2/statements/ endpoint
  3. Added variables to our SQL command to improve the usability
  4. Fetched the data in multiple partitions using a GET request to the /api/v2/statements/<statement_handle> endpoint

If you wanted to take this example even further, you could introduce OAuth instead of JWT, resulting in a smoother developer experience.

Try the efficient, simpler, and safer alternative, Census

It’s clear from this example that the Snowflake SQL API is simple to get up and running initially. But once you start the process of introducing this to a large application, it will become increasingly time-consuming, especially when considering the maintenance that would be involved. Given this hefty price tag, both in terms of time and opportunity cost, consider using Census instead.

Census is the leading reverse ETL tool available and will allow you to move data from Snowflake into all of your tools without needing to spend time setting up and maintaining these integrations. Instead, you can focus your time and resources on what matters: Getting value from and taking action on your data. 💪

✨ Want to start syncing your data? Book a demo with one of our product specialists to learn how we work for your specific operational analytics needs.

Related integrations
No items found.
Get the best data & ops content (not just our post!) delivered straight to your inbox