Tutorials

How to Fix SQL Compilation Error: Missing Column Specification | Census

Furqan Butt
Furqan Butt November 30, 2022

Furqan is a software developer with more than three years of experience in the computer software industry. He writes on Medium for publications including Towards Data Science, and he's part of the AWS Community Builders Program for data and analytics. His core expertise is in data engineering, big data, cloud technologies, and backend development.

Data analytics and business intelligence have grown to be key considerations for most businesses in today's market. Organizations are now investing more in their data technologies, and there's an increasing focus on being data-driven. Unsurprisingly, Snowflake has emerged as a first-choice data warehousing technology solution for many, greatly enhancing the development experience when working on SQL-heavy tasks. 🏆

Although working with SQL on Snowflake is largely similar to working with any other SQL platform, there are some cases where the Snowflake environment differs from the rest. One such case is running into a missing column specification error, which arises due to the distinctive way that Snowflake's query processing engine evaluates and parses SQL queries. 

In this article, you'll take a closer look at this error to better understand what it is, how it arises, and how it can be resolved.

For starters, what’s a SQL compilation error?

Snowflake, like other database platforms, works with SQL and has implemented its own version of a SQL query engine. Although Snowflake largely follows the standard SQL syntax, it has a few modifications to the syntax and semantics when compared to databases such as Postgres or MySQL

As such, queries submitted to Snowflake for execution are compiled and validated to ensure they conform to Snowflake's query syntax rules. Like any other language, SQL conforms to a set of lexical usage rules and semantics. So, when a query is submitted to the query processing engine, it undergoes a process called compilation, during which the SQL statements undergo syntax checks and semantic checks to ensure compliance with these rules. 

Syntax Checks ensure that keywords are spelled correctly, statements are terminated with delimiters, and there aren't any unsupported formats in the query.

Semantic Checks ensure that the objects such as tables and columns used within the query are valid and available to the user.

Any syntax or semantic failure during the compilation process results in a compilation error.

What causes "missing column specification" errors in particular?

"Missing column specification" is a type of SQL compilation error, meaning that this error arises due to some incorrect syntax in the SQL query you're trying to process. 🛑 Specifically, the missing column specification error arises when you use Snowflake's data definition language (DDL) to create a table or create a view (materialized or not).

If you're creating a new table, view, or materialized view using a select column statement, the Snowflake DDL requires that all the referenced columns are explicitly named with aliases. Here’s how this plays out in practice. 👇

Resolving "missing column specification" errors

Before you can reproduce the error on Snowflake, you'll need to set up the SnowSQL client on your machine. 

Setting up SnowSQL

Log in to your Snowflake account. If you don’t have a Snowflake account, you can create one here.

Follow the appropriate SnowSQL CLI installation guide for your system. After installing SnowSQL, execute the following command in the terminal to connect to your Snowflake account:

snowsql -a <account-identifier>.<aws-region>.aws -u <username>

On successful login, you should see the following:

* SnowSQL * v1.2.23
Type SQL statements or !help
<username>#(no warehouse)@(no database).(no schema)>

In a new terminal, clone the following GitHub repo:

git clone https://github.com/furqanshahid85-python/sf-resolve-missing-col-spec-error.git

The data files you'll be using in this tutorial can be found in the data_files folder in the repo, and the db.sql file contains the DDL for all the objects you need to create on Snowflake. From db.sql, execute the following commands in the SnowSQL terminal to create a warehouse, table, and schema in Snowflake:

-- DDL for creating warehouse
CREATE OR REPLACE WAREHOUSE DEV_WAREHOUSE1 
WITH WAREHOUSE_SIZE ='XSMALL'
AUTO_SUSPEND        = 60
AUTO_RESUME         = TRUE
INITIALLY_SUSPENDED = TRUE
COMMENT = 'Warehouse for DEV purposes';
USE WAREHOUSE DEV_WAREHOUSE;

-- DDL for Database creation
CREATE DATABASE IF NOT EXISTS ANALYTICALDB;
USE DATABASE ANALYTICALDB;

-- DDL for schema
CREATE SCHEMA IF NOT EXISTS PUBLIC;
USE SCHEMA PUBLIC

-- DDL for tables
CREATE OR REPLACE TABLE ANALYTICALDB.PUBLIC.USERS (
USER_ID NUMBER(38,0),
JOINED_AT DATE,
COUNTRY_CODE VARCHAR(16777216)
);

CREATE OR REPLACE TABLE ANALYTICALDB.PUBLIC.BOOKINGS (
ID NUMBER(38,0),
STARTTIME DATE,
DURATION_HOURS NUMBER(38,0),
USER_ID NUMBER(38,0),
STATUS VARCHAR(16777216)
);

CREATE OR REPLACE TABLE ANALYTICALDB.PUBLIC.PAYMENTS (
ID NUMBER(38,0),
CREATED_AT DATE,
PAYMENT_AMOUNT FLOAT,
USER_ID NUMBER(38,0)
);

Your database objects are now created. Next, copy the data from the CSV files in the repo into the database tables. To do this, create Snowflake stage and file format objects. 

Then run the following commands from db.sql:

-- DDL for copying csv files
CREATE OR REPLACE STAGE USERS_STAGE;
CREATE OR REPLACE FILE FORMAT USERS_FORMAT TYPE = 'CSV' FIELD_DELIMITER = ',';
PUT file:///tmp/data/Users.csv @USERS_STAGE;
CREATE OR REPLACE STAGE BOOKINGS_STAGE;
CREATE OR REPLACE FILE FORMAT BOOKINGS_FORMAT TYPE = 'CSV' FIELD_DELIMITER = ',';
PUT file:///tmp/data/Bookings.csv @BOOKINGS_STAGE;
CREATE OR REPLACE STAGE PAYMENTS_STAGE;
CREATE OR REPLACE FILE FORMAT PAYMENTS_FORMAT TYPE = 'CSV' FIELD_DELIMITER = ',';
PUT file:///tmp/data/Payments.csv @PAYMENTS_STAGE;

Next, run the following copy commands from db.sql to copy data into snowflake tables:

COPY INTO ANALYTICALDB.PUBLIC.USERS FROM @USERS_STAGE;
COPY INTO ANALYTICALDB.PUBLIC.BOOKINGS FROM @BOOKINGS_STAGE;
COPY INTO ANALYTICALDB.PUBLIC.PAYMENTS FROM @PAYMENTS_STAGE;

At this point, your database tables look like this:

SQL database column

And your table data should look like this:

SQL table data

Solving the "missing column specification" error

Let's look at an example to demonstrate how to resolve this error. In the example, you'll be developing a report to provide insight into the percentage of users, segmented by country, who made their first payment within three days of registration. 

The report will be generated by calculating the rank of payments for each user by filtering the data with payment rank = 1 with a difference of <=3 between the date of joining and the date of payment. Finally, the percentage of payments made within three days of registration will be calculated by dividing the total number of users who made payments within three days of registration by the total number of users. The results are stored in the view users_payment_report.

Start by executing the following query from db.sql in the GitHub repo:

create view user_payments_report as (

SELECT 
     u.country_code,
     COUNT(DISTINCT u.user_id) registered_users,
     COUNT(DISTINCT t.user_id) first_3_days_payment,
     (CAST(COUNT(DISTINCT t.user_id) AS REAL)  / count(DISTINCT u.user_id)) * 100 
FROM 
     Users u 
LEFT join (
            SELECT 
                u.user_id,
                u.joined_at joined_at,
                p.created_at created_at,
                Rank() OVER (partition by u.user_id ORDER BY p.created_at ASC) as payment_rank
            FROM
                Users u
                JOIN Payments p on u.user_id = p.user_id
                WHERE p.created_at >= u.joined_at
        )t
ON u.user_id = t.user_id 
AND t.created_at - t.joined_at  <= 3  
AND payment_rank = 1
GROUP BY
u.country_code
);

When you run the above query it returns the error, SQL compilation error: Missing column specification, like so:

Missing column specification error message
Missing column specification error message

The reason for this error is that when you calculate the percentage of users who made a payment, you did not provide a column alias. Snowflake SQL query syntax requires that whenever you create a table, view, or materialized view using a Select column statement that has calculated fields, such as percentage in this example, you must add an explicit column alias.

Since you didn’t provide a column alias while creating a view from the query results, it triggered a syntax check failure.

-- missing column alias
(CAST(COUNT(DISTINCT t.user_id) AS REAL)  / count(DISTINCT u.user_id)) * 100

In order to fix the issue, you need to add a column alias for the calculated field, which in this case is percentage:

-- column alias added
(CAST(COUNT(DISTINCT t.user_id) AS REAL)  / count(DISTINCT u.user_id)) * 100 percentage

Now, if you run this query again, it executes successfully.

SQL compilation query success
Query execution success

Querying the USER_PAYMENT_REPORT view returns the expected results.

SQL compilation query results
Query results

The following scenarios illustrate additional cases in which the "Missing column specification error" will occur, and how to resolve it in each case:

select max(payment_amount) from payments;
--no DDL, no alias, query works
---------------------------------
create table max_payment as (select max(payment_amount) from payments);
-- alias not used, error: missing column specification

create table max_payment as (select max(payment_amount) **as mp** from payments);
-- alias used, works
---------------------------------
create view max_payment_view as (select max(payment_amount) from payments);
-- alias not used, error: missing column specification

create view max_payment_view as (select max(payment_amount) **as mp** from payments);
-- alias used, works
---------------------------------
create materialized view max_payment_view as (select max(payment_amount) from payments);
-- alias not used, error: missing column specification

create materialized view max_payment_view as (select max(payment_amount) **as mp** from payments);
-- alias used, works

Putting your newfound knowledge to use 🧠

In this article, you've learned what causes the "Missing column specification" error in Snowflake SQL. To recap, we've covered how Snowflake SQL syntax is different from competing platforms, how one of the differences is that it requires you to explicitly add column aliases for calculated fields when creating tables or views, and we even included a business case illustrating this issue to see how it's fixed. 

Now, as businesses are moving towards being more data-driven, it's more crucial than ever to get data in your central data warehouses operational as soon as possible. This is where tools like Census come into play. 💪 Census provides a comprehensive solution for syncing data between Snowflake and business apps such as Salesforce, Stripe, Mixpanel, and Google Ads (just to name a few).

Because Census is the leading reverse ETL tool available, it allows you to move data from Snowflake into all of your tools without needing to spend time setting up and maintaining these integrations. 😮‍💨 Instead, you can focus your time and resources on what matters: Getting value from and taking action on your data.

👉 Want to start syncing your data? Book a demo with one of our product specialists to learn how we work for your specific operational analytics needs.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: