Tutorials

3 Methods for Exporting CSV Files from BigQuery | Census

Michel Zurkirchen
Michel Zurkirchen September 22, 2021

Michel Zurkirchen is an Amsterdam-based digital analyst and a regular contributor to the Census blog. After starting his career in digital marketing, Michel started learning Python and has been an enthusiastic learner and practitioner of data analytics ever since. Amsterdam, North Holland, Netherlands

What you'll learn in this article: How to export a CSV from BigQuery using the three following methods:

  • BigQuery console export
  • bq command-line tool
  • Client library

You probably already know Google BigQuery is an amazing data warehouse. You can set it up to automatically import data from a wide range of systems, use it as a source for a great number of dashboarding tools, and so much more that I could use the rest of the article detailing its use cases (that’s not what we’re going to do though, I promise).

Despite all of its capabilities, you can still find yourself in a situation where BigQuery just doesn't suit your needs for the task at hand and you want an old-fashioned CSV file. Maybe you're passing on data to someone who doesn't (and shouldn't) have access to BigQuery, perhaps you need that data to build something amazing in a tool that doesn't play well with BigQuery, or maybe you just want to know how to accomplish this for the sake of knowing it.

Whatever your motivation is, I've got you covered. Without further ado, here are three ways to export your Google BigQuery data to a CSV file for use in your destination apps.

Method #1: BigQuery console export

By far the easiest way of exporting your data to a CSV file is to use the web UI, also known as the console, which you can find here. This is a great option if you just need an easy way to export your data one time. To start, make sure you’re in the correct project, which you can confirm at the top of the screen, just left of the search bar. You can then complete the following steps:

  1. In the Explorer panel, expand the correct project and dataset. Select the table you want to export.
  2. On the far right, click on Export and then Export to GCS, which stands for Google Cloud Storage.
  3. In the menu that opens, choose the GCS bucket that you want to export the data to and the filename, for example, yourbucket/yourfolder/yourdata.csv. You can use the browse option to find the right bucket, or create a new one first if necessary. Optionally, you can choose a format other than CSV, as well as GZIP compressing the file to reduce its size.
  4. Go to GCS and navigate through the bucket to the location where you saved your CSV file. To the right of the filename is a download icon.

And voilà! You now should have a CSV with the data you care about for your use case (whatever it is).

Method #2: bq command-line tool

BigQuery comes with its own command-line tool called bq, which you can access in the console or in your local terminal by first installing the Google Cloud SDK. In the console, you'll find it here or by clicking on the terminal icon on the right side of the search bar. While it's not as straightforward as the previous option, it can be quicker than clicking through several menus once you've memorized the commands. The basic syntax to export a CSV file is:


bq extract [--FLAGS] RESOURCE DESTINATION

RESOURCE is the table that you are exporting and DESTINATION is the bucket and location in GCS that you are exporting to. FLAGS are optional and allow you to:

  1. Choose a compression type, if any.
  2. Choose a file format, CSV being the default.
  3. Choose a delimiter for CSV files.
  4. Choose whether to include the header, the default being true.

If you were to use all the available flags to export the well-known Austin bike-sharing data from BigQuery's public dataset, the command would look like this.


bq extract --compression=GZIP --destination_format=CSV --field_delimiter=tab --print_header=false bigquery-public-data:austin_bikeshare.bikeshare_stations gs://your_bucket/your_folder/your_file.csv.gzip

The reference documentation lists all the accepted values for the flags. The extract command will export the CSV file to your GCS. If you installed the Google Cloud SDK, you can use the following command in your local terminal to download the file from your GCS.


gsutil cp gs://your_bucket/your_folder/your_file.csv.gzip /your_local_folder

Using the Google Cloud SDK is useful if you frequently need to export your BigQuery data and especially if you know your way around cron jobs.

Method #3: Client library

Last but certainly not least is the option of exporting CSV files using the client library. If you’re familiar with BigQuery, it probably doesn't surprise you that you can access the BigQuery API through a client library in several different languages such as Node.js, Java, and Python.

This option takes the most effort to set up, but once you are up and running, you can integrate it perfectly in an automated workflow. Google has a quick start to get you ready for your first query. Once you are ready to run your first query, you can export your data by adapting the below example (Python) code to fit your needs.


# Import library
from google.cloud import bigquery

# Declare client
client = bigquery.Client()

Should declaring a Client result in a DefaultCredentialsError, you can easily resolve the error using this solution.


# Parameters for your case 
project = 'bigquery-public-data'
dataset_id = 'austin_bikeshare'
table_id = 'bikeshare_stations'
table_location = 'US' # Open the table in the BigQuery web UI, go to the details tab and look for 'Data location' 

bucket_name = 'your_bucket'
file_location = 'your_folder/austin_bikeshare.csv'
destination_uri = f'gs://{bucket_name}/{file_location}'
dataset_ref = bigquery.DatasetReference(project, dataset_id)
table_ref = dataset_ref.table(table_id)

# Exporting the entire table
extract_job = client.extract_table(
    table_ref,
    destination_uri,
    location = table_location,
)

extract_job.result()

The entire table has now been stored in GCS as a CSV file. You can download the file with the below code.


from google.cloud import storage

bucket_name = 'your_bucket'
source_file_name = 'austin_bikeshare.csv'
destination_file_name = 'austin.csv'

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_file_name)

blob.download_to_filename(destination_file_name)

Alternatively, you can use a standard SQL query to limit the data that is exported and immediately store the result in a CSV, bypassing GCS.


# Querying a subset of the table
sql_query = (""" SELECT station_id, name, status FROM bigquery-public-data.austin_bikeshare.bikeshare_stations WHERE status = 'active' """)

# Storing the data in a pandas DataFrame
df = client.query(sql_query).to_dataframe()

# Saving the DataFrame to a CSV
df.to_csv('filename.csv')

Then, once you’ve executed the code above, you’re all set to continue your data adventure, CSV in hand (or file structure). Before you go, however, there are some final notes on exporting CSV files from Google BigQuery you should know.

Know before you go: Limitations with exporting CSV files from BigQuery

There you have it. Whether you really needed this information for day-to-day data use, ad hoc data analysis, or were just curious if it was possible, you now know three different methods to export your data as a CSV file from BigQuery. There are still multiple limitations that you need to keep in mind. Pay special attention to the ones below to save yourself some troubleshooting.

First: You can export 1 GB of table data to a single file in GCS, at most.

If your data exceeds that limit, use a wildcard to export the data to multiple files, like so: gs://your_bucket/your_file_name_*.csv.

This will create files called gs://your_bucket/your_file_name_000000000000.csv, gs://your_bucket/your_file_name_000000000001.csv and so on.

Secondly, It’s not possible to export nested and repeated data to a CSV file. You’ll have to use the UNNEST function first, after which you can proceed(e.g. Google Analytics data is nested).

If, after taking a look at that limitations breakdown, you’re not so sure you want to repeat the easy methods over and over again or go through the effort of setting up the export through the BigQuery client library, I have great news for you: If you just use Census, you won’t have to do any of this.

That’s right. A reverse ETL tool makes it fast and easy to send your data from BigQuery to just about any destination you need your data in so you can spend less time reading tutorials like this and more time on data analytics. After playing around with the CSV export options above, go ahead and check out how easy life could be with Census instead.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: