Tutorials

How to unload data from Snowflake | Census

Michel Zurkirchen
Michel Zurkirchen October 25, 2022

Michel Zurkirchen is an Amsterdam-based digital analyst and a regular contributor to the Census blog. After starting his career in digital marketing, Michel started learning Python and has been an enthusiastic learner and practitioner of data analytics ever since. Amsterdam, North Holland, Netherlands

If you're one of the new kids on the Snowflake block, you might be wondering how you can unload your data from Snowflake so that you can put it to use in other places. Maybe you just moved here from the Redshift block and are wondering why Snowflake doesn't have an UNLOAD command – or you're just in a general state of confusion. 😶‍🌫️

Either way, we'll help you get started on your Snowflake data unloading journey by showing you how to get the job done. While it might be tempting to jump in and get started on your own, trust us: Spending a few minutes to read this article can save you time and money in the long run.

(We'll even throw in a few things you should consider before you get started).

Why unload data from Snowflake?

The first step in this process is always the same: Ask yourself why you're unloading data to begin with. There are different ways to get the job done, but which way is the best depends on your situation. 👇

Are you using the data in a different business tool?

Gathering data is great, but at some point, you'll want to do something with it. You can build a dashboard in Tableau, merge it with other data in your CRM, or do a number of other useful things with it. 

But you run the risk of wasting a lot of your time if you automate an export process, only to find out later that a single manual export would have sufficed. Conversely, you don't want to perform the same manual export every day – unless you consider it a meditative experience. So, evaluate whether you’ll need this data in your tools more than once before deciding whether you’ll manually or automatically unload your data.

Maybe you’re moving data to a different data service?

Your company probably uses multiple data storage services, including (at a minimum) a data lake and a data warehouse. At some point, then, you'll want to easily unload data from Snowflake to another cloud environment, such as Amazon S3 or Google Cloud Storage. Great news: You can. 👍

Backup your data

Regardless of how you're using your data, make sure the raw data is stored safely. If you're manipulating your data in any way, back it up – or it could be gone for good. Depending on the scale of what you're about to do, a suitable backup could be as simple as a different table in the same database. However, if you have a more complex case, it might mean copying your entire database to a different platform. 

How to unload data from Snowflake

Now that we’ve gone through some different reasons why you might be unloading data from Snowflake and how that changes your approach, it’s time to get to the good stuff: The how. 

The COPY INTO command enables you to copy an entire table or a query result to a Snowflake stage, from where you can download the data to your local system. Alternatively, you can immediately unload the data to Amazon S3, Google Cloud Storage, or Microsoft Azure. The syntax for the command is as follows.

COPY INTO { internalStage | externalStage | externalLocation }
     FROM { [<namespace>.]<table_name> | ( <query> ) }
[ STORAGE_INTEGRATION = <integration_name> ]
[ PARTITION BY <expr> ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[<namespace>.]<file_format_name>' |
                    TYPE = { CSV | JSON | PARQUET } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_ROWS ]
[ HEADER ]

Let's break this down line by line. COPY INTO specifies the destination to unload data to, whether that’s external or internal.

-- Internal or external stage
COPY INTO @your-stage/your-data.csv

-- Google Cloud Bucket
COPY INTO 'gcs://your-bucket/your-folder/'

-- Amazon S3
COPY INTO 's3://your-bucket/your-folder/'

-- Microsoft Azure
COPY INTO 'azure://your-account.blob.core.windows.net/your-bucket/your-folder/'
FROM specifies the source, and can be either a query or a table.

-- Copying an entire table
FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CUSTOMER

-- Copying using a query
FROM (SELECT C_CUSTOMER_ID, C_FIRST_NAME, C_LAST_NAME, C_EMAIL_ADDRESS FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CUSTOMER)

STORAGE_INTEGRATION is only needed when unloading to an external storage location. The integration contains authorization to make it possible for Snowflake to write data to the external location. Setting up the integration is a separate process that you need to perform before unloading your data.

STORAGE_INTEGRATION = YOUR_INTEGRATION

PARTITION BY splits the data into separate files based on a string input. For this purpose, we'll split our data based on the first letter of the last name of our customers. Make sure you don't PARTITION BY any columns containing sensitive data, or you might inadvertently cause a data leak. 😱

PARTITION BY LEFT(C_LAST_NAME, 1)

We'll set our file type to CSV for this example (though we have a few different options, but we'll get back to in the next section). We'll also compress it to GZIP. If you plan on repeatedly unloading data, you’ll need to ensure consistency. In that case, consider creating a named file format, with which you can define file type, compression, and other parameters upfront.

-- Setting a file type and compression
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION = 'GZIP')

-- Applying a named file format
FILE_FORMAT = (FORMAT_NAME = YOUR_NAMED_FORMAT)

The copyOptions are optional parameters you can include to further tweak the result of COPY INTO. We'll stick to using only one of them, which will output the path, name, size and number of rows of each file that you unloaded, so you can verify that everything went correctly.

DETAILED_OUTPUT = TRUE

VALIDATION_MODE only accepts one possible value, which is RETURN_ROWS. Including VALIDATION_MODE = RETURN_ROWS tells Snowflake to return the rows instead of writing to your destination, so you can determine whether you have the correct data. This is definitely a good precaution to take so that you don't spend time and budget writing the wrong data. Once you're certain that you're all good to go, remove VALIDATION_MODE to write the data to your destination.

With HEADER you can specify whether you want the data to include… 🥁🥁🥁

Headers! Shocking. It can be set to TRUE or FALSE, with FALSE being the default.

HEADER = TRUE

If we combine all of the above, we get the query below. Barring any errors, such as your storage integration not being set up correctly, executing this query will result in your Snowflake data being unloaded to your Google Cloud Storage. Note that we left out VALIDATION_MODE = TRUE. Always confirm at the destination that you successfully unloaded data.

COPY INTO 'gcs://your-bucket/your-folder/'
FROM (SELECT C_CUSTOMER_ID, C_FIRST_NAME, C_LAST_NAME, C_EMAIL_ADDRESS FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CUSTOMER)
STORAGE_INTEGRATION = YOUR_INTEGRATION
PARTITION BY LEFT(C_LAST_NAME, 1)
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION = 'GZIP')
DETAILED_OUTPUT = TRUE
HEADER = TRUE;

If you unloaded to an internal stage rather than an external location, you could download your data to your local machine with the GET command, like so.

-- Linux/macOS
GET @%your-table file:///your-local-folder/;

-- Windows
GET @%your-table file://C:\your-local-folder\;

Which file type to unload to

Before actually unloading data, consider which file type is the best option. Each file type comes with its own pros and cons, but a good starting point is checking which file types can handle your data. For example, JSON can only be used for VARIANT columns, whereas you can’t use Parquet when you have nested data in VARIANT columns. 

You also need to determine which file types your destination can handle. Storing your data as a Parquet file isn't a problem for major data platforms such as Google Cloud Storage, but you won't be making any friends with your colleague who wants to do a quick analysis in Excel. 😬

Also, think of how much data you’re unloading and ingesting. It doesn't really matter what you do if your data is only a few MBs, but if you're unloading terabytes of data, you’ll want to reduce your file size by choosing Parquet when possible – plus it will greatly cut down on costs. 

Where to go from here

Hopefully, you ended up a little wiser than before you started, and you now feel secure in handling whichever Snowflake unloading task comes your way. But if all the manual or automatic unloading setup sounds like a bear, consider Census reverse ETL instead. 

Census has over 85 destination integrations for your Snowflake data, helping you get your data where you need it (without the incurred frustration from the added step). With sync speeds over 4,000 records per second, detailed observability, and 50 fields for free, it’s easy to get started and grow to any scale. 🚀

Book a demo with a product specialist to see how Census can help you operationalize your data. 💪