REST API to Snowflake Metadata-Driven ADF Pipeline

From REST API to Snowflake: Building a Metadata-Driven Azure Data Factory Pipeline

Celestinfo Software Solutions Pvt. Ltd. Jan 08, 2026

Last updated: January 2026

Quick answer: Build a metadata-driven ADF pipeline by storing API configurations (URL, endpoint, target table) in a Snowflake metadata table. Use a Lookup activity to read configurations, a ForEach activity to iterate over each API, a Copy activity to fetch JSON responses into ADLS Gen2, and a Snowflake stored procedure to dynamically create VARIANT tables and load the JSON data. Add new APIs by inserting rows into the metadata table -- no pipeline changes needed.

Introduction

A metadata-driven Azure Data Factory pipeline automates REST API ingestion into Snowflake by decoupling pipeline logic from source configuration. If you are new to ADF, start with our beginner guide to creating a pipeline in Azure Data Factory. Instead of building separate ADF pipelines for each API, this approach reads API configurations from a Snowflake metadata table, dynamically fetches data, and loads it into Snowflake VARIANT columns.


This metadata-driven ADF pipeline tutorial demonstrates how to build a reusable pipeline that automatically fetches data from multiple REST APIs, stores JSON responses in ADLS Gen2, creates Snowflake tables dynamically, and supports incremental loads with comprehensive execution logging.


By the end of this tutorial, you will have a reusable, production-ready pipeline architecture that supports incremental loads through time-window parameters, comprehensive execution logging, and full error handling with success and failure tracking. For another Fabric-based approach, explore our Azure SQL to Snowflake via Fabric and OneLake pipeline.


Architecture Overview


The end-to-end data flow follows this path:


REST API → ADLS Gen2 (JSON Files) → Snowflake (VARIANT Table)


The pipeline executes the following steps in sequence:


  1. Read Metadata: API details (URL, endpoint, target table name) are stored in a Snowflake metadata table called API_METADATA.
  2. ADF Lookup Activity: Reads all active API configurations from the metadata table.
  3. ForEach Activity: Iterates over each API configuration returned by the Lookup.
  4. REST Copy Activity: Fetches the API response using the Azure Data Factory REST connector.
  5. Save to ADLS: Stores the raw JSON response as a file in Azure Data Lake Storage Gen2.
  6. Dynamic Table Creation: Uses a Script Activity to create the Snowflake target table if it does not already exist.
  7. Load into Snowflake: A Copy Activity loads the JSON file from ADLS into the Snowflake VARIANT column.
  8. Log Execution Status: Script Activities log success or failure into an execution log table.
  9. Time Window Control: A pipeline parameter called END_TS controls time windows for incremental loads.

Linked Services


The pipeline requires three linked services to connect ADF to external systems:


Linked Service Type Purpose
REST Linked Service REST Connects to REST APIs using base URL and authentication settings. The base URL is parameterized so it can be dynamically set per API from the metadata table.
Snowflake Linked Service Snowflake Connects to your Snowflake account using account name, database, warehouse, and authentication credentials. Used for both reading metadata and writing data.
Azure Blob Storage Linked Service Azure Blob Storage Connects to ADLS Gen2 via SAS URL or service principal. Used as the intermediate staging area for JSON files before loading into Snowflake.

Datasets


Four datasets are required to support the pipeline activities:


Dataset Format Parameter Purpose
JSON Dataset JSON FILENAME Points to the ADLS location where JSON responses are saved. The filename is dynamically set based on the API name and timestamp.
REST Dataset REST BaseURL Represents the REST API source. The BaseURL parameter is populated from the metadata table for each API.
Snowflake Source Dataset Snowflake None Used by the Lookup activity to read from the API_METADATA table.
Snowflake Sink Dataset Snowflake TABLENAME Used by the Copy Activity to load JSON data into the dynamically specified Snowflake table.

Pipeline Design


Pipeline Parameter


The pipeline accepts one parameter:


Activity Flow


The pipeline is organized as follows:


1. Lookup Activity – Read API Metadata

The Lookup activity queries the API_METADATA table in Snowflake to retrieve all active API configurations. It uses a query like:

Lookup Query - Snowflake
SELECT API_ID, API_NAME, BASE_URL, ENDPOINT, TABLE_NAME
FROM API_METADATA
WHERE IS_ACTIVE = TRUE;

The output is a list of API records that the ForEach activity will iterate over.


2. ForEach Activity – Iterate Over Each API

The ForEach activity loops through each record returned by the Lookup. Inside the ForEach, the following activities are chained together:


2a. Script Activity (LOGSTART) – Log Start of Execution

Before fetching any data, a Script Activity inserts a log record into the EXECUTION_LOG table with a status of RUNNING:

Log Start - Snowflake Script
INSERT INTO EXECUTION_LOG (API_NAME, TABLE_NAME, STATUS, START_TIME)
VALUES (
  '@{item().API_NAME}',
  '@{item().TABLE_NAME}',
  'RUNNING',
  CURRENT_TIMESTAMP()
);

2b. Copy Activity (REST to JSON) – Fetch API and Save to ADLS

This Copy Activity uses the REST dataset as the source and the JSON dataset as the sink. The source URL is constructed dynamically by combining the BASE_URL and ENDPOINT fields from the metadata. The JSON file is saved to ADLS Gen2 with a filename based on the API name and current timestamp, ensuring each execution creates a unique file.


2c. Script Activity (TABLE CREATION) – Create Snowflake Table Dynamically

A Script Activity executes a CREATE TABLE IF NOT EXISTS statement in Snowflake. The table uses a single VARIANT column to store the entire JSON response:

Dynamic Table Creation - Snowflake Script
CREATE TABLE IF NOT EXISTS @{item().TABLE_NAME} (
  RAW_DATA VARIANT,
  LOAD_TIMESTAMP TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
  SOURCE_FILE STRING
);

This eliminates the need to manually create tables before running the pipeline. Each new API added to the metadata table will automatically have its target table created on first execution.


2d. Copy Activity (JSON to Snowflake) – Load JSON into VARIANT Column

This Copy Activity reads the JSON file from ADLS and loads it into the Snowflake table created in the previous step. The Snowflake sink dataset uses the TABLENAME parameter, which is dynamically set to @{item().TABLE_NAME}. The JSON data is mapped to the RAW_DATA VARIANT column, preserving the full JSON structure.


2e. Script Activity (UPDATE LOG – SUCCESS)

On the success path of the Copy Activity, this Script Activity updates the execution log to mark the run as completed:

Log Success - Snowflake Script
UPDATE EXECUTION_LOG
SET STATUS = 'SUCCESS',
    END_TIME = CURRENT_TIMESTAMP()
WHERE API_NAME = '@{item().API_NAME}'
  AND STATUS = 'RUNNING';

2f. Script Activity (UPDATE LOG – FAIL)

On the failure path (connected via the On Failure dependency), this Script Activity captures the error and marks the run as failed:

Log Failure - Snowflake Script
UPDATE EXECUTION_LOG
SET STATUS = 'FAILED',
    END_TIME = CURRENT_TIMESTAMP(),
    ERROR_MESSAGE = '@{activity('Copy_REST_to_JSON').error.message}'
WHERE API_NAME = '@{item().API_NAME}'
  AND STATUS = 'RUNNING';

Metadata Table Structure


The API_METADATA table stores all configuration needed for each API source. Adding a new API requires only inserting a new row into this table:

API_METADATA Table - Snowflake DDL
CREATE TABLE API_METADATA (
    API_ID INT,
    API_NAME STRING,
    BASE_URL STRING,
    ENDPOINT STRING,
    TABLE_NAME STRING,
    IS_ACTIVE BOOLEAN,
    LAST_RUN_TS TIMESTAMP
);

Column Description
API_ID Unique identifier for the API source
API_NAME Descriptive name of the API (e.g., "WeatherAPI", "ExchangeRates")
BASE_URL The root URL of the API (e.g., "https://api.example.com")
ENDPOINT The specific API endpoint path (e.g., "/v1/data")
TABLE_NAME Snowflake target table name where data will be loaded
IS_ACTIVE Boolean flag to enable or disable an API without deleting the row
LAST_RUN_TS Timestamp of the last successful execution for incremental tracking

Execution Log Table


The EXECUTION_LOG table captures the runtime status of every pipeline execution, providing complete observability:

EXECUTION_LOG Table - Snowflake DDL
CREATE TABLE EXECUTION_LOG (
    LOG_ID INT AUTOINCREMENT,
    API_NAME STRING,
    TABLE_NAME STRING,
    STATUS STRING,
    START_TIME TIMESTAMP,
    END_TIME TIMESTAMP,
    ERROR_MESSAGE STRING
);

Column Description
LOG_ID Auto-incrementing primary key for each log entry
API_NAME Name of the API that was processed
TABLE_NAME Target Snowflake table for this API
STATUS Execution status: RUNNING, SUCCESS, or FAILED
START_TIME Timestamp when the pipeline started processing this API
END_TIME Timestamp when the processing completed or failed
ERROR_MESSAGE Captured error message if the execution failed (NULL on success)

Key Benefits



Conclusion


Building a metadata-driven Azure Data Factory pipeline for REST API ingestion into Snowflake provides a scalable, maintainable, and production-ready architecture for enterprise data integration. By storing API configurations in a metadata table, the pipeline eliminates hardcoded values and makes it trivial to onboard new data sources.


The combination of ADF's ForEach activity for parallel processing, ADLS Gen2 for intermediate JSON storage, and Snowflake's VARIANT data type for flexible semi-structured data handling creates a robust end-to-end data ingestion framework. The comprehensive execution logging with success and failure tracking ensures full observability and simplifies troubleshooting.


This pattern is particularly valuable for organizations that need to ingest data from dozens or hundreds of REST APIs, where maintaining individual pipelines for each source would be impractical. With this metadata-driven approach, your data engineering team can focus on data quality and analytics rather than pipeline maintenance.

Chandra Sekhar, Senior ETL Engineer

Chandra Sekhar is a Senior ETL Engineer at CelestInfo specializing in Talend, Azure Data Factory, and building high-performance data integration pipelines.


Frequently Asked Questions

Q: What is a metadata-driven pipeline in Azure Data Factory?

A metadata-driven pipeline reads its configuration (API URLs, table names, parameters) from a metadata table rather than hardcoding values. This allows you to add new data sources by inserting rows into the metadata table without modifying the pipeline. In this architecture, the API_METADATA table in Snowflake stores all API configurations, and the ADF Lookup activity reads these at runtime to drive the ForEach processing loop.

Q: Why use VARIANT columns in Snowflake for JSON data?

Snowflake's VARIANT data type natively stores semi-structured data like JSON. It preserves the full JSON structure and allows you to query nested fields using dot notation and LATERAL FLATTEN, eliminating the need to define a rigid schema upfront. This is especially useful for REST API responses where the JSON structure may vary between APIs or change over time without requiring table schema modifications.

Q: How does the pipeline handle API failures?

The pipeline uses ADF's built-in error handling with On Failure dependency paths. If the REST Copy Activity or the Snowflake load Copy Activity fails, execution flows to a failure Script Activity that logs the error message and marks the status as FAILED in the EXECUTION_LOG table. This allows operations teams to monitor failures, investigate root causes, and retry specific APIs without rerunning the entire pipeline.

Related Articles

Burning Questions
About CelestInfo

Simple answers to make things clear.

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

Insights are updated in real-time as new data becomes available.

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.