Tutorials

Snowsight Tutorial: Go for a spin with Citi Bike | Census

Trevor Fox
Trevor Fox April 13, 2022

Trevor leads the growth team at Census. His passion for taking action on data began with a brief baseball career causing him to devour scouting reports to find a competitive edge. He's spent the last 10 years learning about digital marketing, programming, and data with a mission to innovate in the growth space. Portland, Oregon, United States

If you’re new to Snowflake or you're using the classic web console to run SQL queries on Snowflake data... Welcome!

You’re in for a sweet ride, so hop on – we'll show you around! 🚲

Snowsight is a remarkably quick and intuitive second-generation Snowflake web interface that can be used by all accounts running on Amazon Web Services (AWS), Microsoft Azure, and the Google Cloud Platform. Of course, you can use Snowsight to run queries and manage your Snowflake data cloud, but you can also do so much more.

In Snowsight, you can create, modify, and manage users, databases, and database objects, share data with other Snowflake accounts and navigate the Snowflake Data Marketplace all in one streamlined interface.

Unlike traditional business intelligence tools like Tableau, Qlik, or Power BI which are more geared towards business users and corporate reporting needs, Snowsight uses fast querying, SQL auto-completion, automatic stats, and various charts for data visualization. What does that mean for users? Basically, you can complete ad-hoc data analysis on massive datasets in a jiffy.

In this tutorial, you’ll learn how to get started with Snowsight by creating a worksheet, running a query, analyzing the results, and summarizing the results with automatically generated contextual statistics. To explore the interface, we’ll take Snowflake’s Citi Bike sample data for a spin.

Get ready to ride 🚴

Picture this: the year is 2018 and you’re a member of Lyft’s marketing team. Lyft has just acquired the popular New York City bike-sharing platform, Citi Bike, and you’re interested in looking at some of their data for future marketing campaign strategizing purposes.

Unfortunately, you find that Citi Bike tracks information about every, single ride (including where it was unlocked, how long the ride was, and where it was docked) as well as some demographic and subscription information, giving you tons of data to wade through.

With 8 million data points in 2018 alone, a typical analysis with this volume of data would be painful (not to mention cumbersome), but with Snowsight in Snowflake you can quickly and painlessly extract the information you need.

Running SQL queries with a Snowsight worksheet

Snowsight is the user interface/user experience (UI/UX) replacement for the Classic Console in Snowflake and comes equipped with many unique, out-of-the-box query improvements.

When you first log in to your Snowflake instance you can either import your old worksheets (if you previously used the Classic Console), or create a new worksheet. For this example, we'll start by creating a new worksheet, renaming this worksheet “CITIBIKE”, and connecting it to the Citi Bike database that we have previously loaded with Citi Bike “trips data."

If you start typing a query, you’ll notice the first improvement of Snowsight is the SQL “auto-complete”, which helps complete the query for you to minimize mistakes. You can see here that typing “tri” after “select *” brings up some of the existing tables and even shows how many rows are in each table. For this example, we will query “trips_2018."

Quick result summaries in the contextual statistics pane

After running the query, there will be several outputs. You’ll see a “results” section that shows the data in tabular form and, to the right of the query results, there will also be an area for Contextual Statistics. Snowflake automatically calculates these statistics and produces simple visualizations to help you make sense of your data at a glance. 🙌

Contextual statistics are generated automatically for queries that result in 100k or fewer rows; however, in this case, the query result was too large.

We’ll now re-run the query setting a limit of 999,999. Since this is just 1/8 of the data and we’re most interested in trips with the longest duration, we’ll sort by trip duration.

The statistics will take a little while to generate, but you’ll notice some interesting trends. For instance, of the ~1M longest trips, only 3 stations were the end destination for more than 1% of those trips, with the Central Park South & 6 Avenue stations being the most popular end station.

Are you interested in learning more about the type of membership for these trips? Those statistics are right at your fingertips! In fact, it shows that almost half of the riders have annual memberships, while the second-largest proportion has purchased just a 24-hour ride.

Now that you have this information, maybe you want to find out some more details about those specific riders who end at the Central Park South and 6 Avenue stations so you can figure out the best marketing campaign to target them. You might be wondering:

Do these riders have a smaller/larger proportion of annual memberships that might make them respond well to an annual discount?

Or, conversely,

Are these riders tourists?

If it’s the latter, it may make more sense to partner with other experiential companies to drive more traffic to these stations (no pun intended). Either way, we can use this question to form a new query and build a quick histogram to find out more details. For the new query, we’ll add a “where” filter using "end_station_name = 'Central Park S & 6 Ave'."

After running the query, the hypothesis that more tourists (or just first-time Citi Bikers) are using Citi Bike for a day seems fairly accurate. According to the chart, the gap between annual members and 24-hour customers is quite small, so an annual discount for a marketing campaign wouldn't pack a very powerful punch. 👊

If you’re considering other campaign options and you need other details (such as gender, seasonal patterns, or you want to combine multiple data sources), you can run a new query to get that information, too. Using Snowsight, you can determine the most effective approach to targeting these customers.

You can also use Snowsight to build these charts into a dashboard, then easily share them with your colleagues to tell a story or just to share some quick insights. 💡

Want to see how? Let's go! 🚲

Using Dashboards and Tiles to filter results

Using the information we analyzed in part one – the most common bike drop-off stations and the types of memberships – we'll create a new Dashboard by navigating to the Dashboards tab and clicking the blue Dashboard button, located at the top right of the window.

You can give the Dashboard any name you’d like and, from here, you can start adding Tiles. Tiles can either be newly added from the Dashboard itself, or they can be created from existing Worksheets. For this purpose, we’ll go into one of our existing Worksheets, click the 3 dots next to the worksheet name, and “Move” the Worksheet to the new Dashboard.

Of course, you can customize the look and feel of the query with your preferred chart types (whether that’s a histogram or just a results table). I decided on a bar chart, so now you can see the first building block of your Dashboard. 🏗️

Remember, the data we’re looking at is for all Citi Bike trips in 2018. During that time, the most popular ending stations were in busy areas, so we can probably infer that these users are mainly commuters that are dropping their bikes off at the end of the day.

Now, here’s where the power of Snowsight really comes in – you can add filters to the Dashboard. These filters allow you to easily start exploring the data without running SQL queries or connecting an outside visualization tool. Snowsight has two out-of-the-box filters, “:datebucket” and “:daterange”, with more details on these available if you explore the filter icon on the top left of the Dashboard.

In this case, we don’t care about grouping the dates into buckets, so we’ll focus only on the :daterange filter. To enable the filter, add it in the “where” clause within the Tile query, as shown.


select end_station_name, count(*)
from trips_2018
where date_trunc('DAY',STOPTIME) = :daterange
group by end_station_name
order by count(*) desc

📣

Builders beware! Keep in mind that the date is a timestamp and needs to be truncated with “date_trunc” for it to work properly in the :daterange filter.

Now that you’ve added this filter, you’ll see a date range filter option on the top of the Dashboard, so you can drill further into data within your date ranges of interest. Let’s see if the warm weather causes any differences in end stations by setting the date range to look at only June. ☀️

In June, there’s an interesting change in the data – the West St & Chambers St station moves to the top of the list, whereas before it was only the 5th most popular end station. This location is located near the Hudson River walkway and popular parks which could be an attraction for non-regular riders (like tourists) to take day trips. Let’s pull some additional data into the Dashboard to confirm, though, before coming to any marketing campaign conclusions.

We’ll add a new Tile to the Dashboard that groups the trips by membership_type and counts each type alongside a displayed percentage. When we apply this Tile, the June membership data shows that only about 10% of trips were day trips.

Snowsight also allows you to add your own filters to the Dashboard. I’ll add a filter for “end station”, so I can look more closely at particular stations. 🔍

To add a new filter, you’ll simply click the Filter button on the Dashboard again, but for a custom filter, you can use a SQL query or a predefined list. Since we’re doing a custom filter, I’ll use a query to find the distinct station names and order them alphabetically for ease of use. It looks something like this:


select distinct end_station_name
from trips_2018
order by end_station_name asc

Voila! Now you should see a new custom filter for end stations.

To see the filter appear on the Dashboard, add it to the most recently added Tile query in the “where” clause, as we did with the previous :daterange filter.


select membership_type, count(*), ratio_to_report(count(*)) over () as percentage
from trips_2018
where date_trunc('DAY',STOPTIME) = :daterange
and end_station_name = :endstation
group by membership_type
order by count(*) desc

The End Station filter should be at the top of the Dashboard, near the date range filter, so you can look more closely at the West St & Chambers St data.

Becoming data-driven with Snowsight

After doing a deeper dive into the data, you can see that there are far more annual membership riders than day riders for this particular drop-off location at this time of year, so June could be a great opportunity to create a discounted annual membership marketing campaign. Armed with this data-backed approach, your marketing team may decide to add an annual membership discount flier at this station or target these particular riders within the app itself. Whatever the team chooses to do, you can all feel confident that you’re using a data-driven approach thanks to Snowsight.

Keep in mind, these are just a few, quick takeaways from the data! You can share your Dashboard with other data analysts on your team to find even more valuable insights. There are also tons of other use cases, so you can easily use your Snowflake account as a one-stop shop to analyze any past performance and make informed decisions about your datasets moving forward.

If you actually do work at Lyft (and even if you don’t) and want to send your data from Snowflake to Salesforce, Google Ads, or Braze to run personalized marketing campaigns at scale, get your free Census account and start syncing today!

Related articles

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps:

Best Practices
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue
How Retail Brands Should Implement Real-Time Data Platforms To Drive Revenue

Remember when the days of "Dear [First Name]" emails felt like cutting-edge personalization?

Product News
Why Census Embedded?
Why Census Embedded?

Last November, we shipped a new product: Census Embedded. It's a massive expansion of our footprint in the world of data. As I'll lay out here, it's a natural evolution of our platform in service of our mission and it's poised to help a lot of people get access to more great quality data.