Welcome to this step-by-step guide on loading unstructured JSON data from an AWS S3 bucket into Snowflake! If you’re new to Snowflake or data engineering, don’t worry—this blog will walk you through the process in a clear and approachable way. By the end, you’ll know how to set up an S3 integration, stage JSON data, parse it, and load it into both unstructured and structured tables in Snowflake. Let’s dive in!
Before we start, ensure you have:
First, let’s create a database and schema to organize our work.
To access files in S3, we need a storage integration to securely link Snowflake with your S3 bucket.
After creating storage integration, update the value of ‘STORAGE_AWS_EXTERNAL_ID’ and ‘STORAGE_AWS_IAM_USER_ARN’.
Copy the values from the output, and update the trust policy in AWS IAM role. If you cant understand this step please go through our previous blog.
Snowflake needs to know how to parse the JSON files. We’ll create a file format for JSON.
An external stage points to your S3 bucket, linking it to Snowflake via the storage integration.
After creating the external stage, check the external stage has connected correctly or not. Just by listing, what file are there in the stage.
Let’s create a table with a VARIANT column to store the raw JSON data.
After copying the data from external stage, check if the data is loaded correctly.
Now, let’s query and parse the JSON data in UNSTRUCT_TBL. We’ll use Snowflake’s JSON parsing capabilities.
If the JSON is a single object per row (not an array), use dot notation:
Note : while parsing, write the column names properly they are case sensitive.
If the JSON contains arrays (nested fileds) use the FLATTEN function
To make the data more query-friendly, let’s load it into a structured table with defined columns.
Define a table matching the JSON schema.
Extract fields from UNSTRUCT_TBL and insert them into STUDENTS_DATA.
Storage Note: Storing data in both UNSTRUCT_TBL and STUDENTS_DATA doubles storage costs. Consider dropping UNSTRUCT_TBL after loading if it’s no longer needed:
We can’t directly copy the JSON file from external stage to snowflake Sturcutre table. Start with creating another table to load the same file from S3 blog.
Let’s wrap it up!You’ve just taken a big step into the world of Snowflake by learning how to pull JSON data from an AWS S3 bucket and make it shine in structured tables.From setting up your S3 connection to parsing JSON, you’ve got the tools to turn messy data into something, grab your own JSON files, maybe try out Snowpipe for some automation flair.For more informations on Snowpipe, please visit our other blogs.