REST API to Snowflake Metadata-Driven ADF Pipeline

From REST API to Snowflake: Building a Metadata-Driven Azure Data Factory Pipeline

Q: How does the pipeline handle API failures?

The pipeline uses ADF's error handling with On Failure paths. If the REST Copy Activity or Snowflake load fails, a Script Activity logs the error message and marks the status as FAILED in the execution log table, allowing monitoring and retry.

Chandra Sekhar

Published Jan 08, 2026 · Last updated Feb 2026

Celestinfo Software Solutions Pvt. Ltd. • Jan 08, 2026

Last updated: January 2026

Quick answer: Build a metadata-driven ADF pipeline by storing API configurations (URL, endpoint, target table) in a Snowflake metadata table. Use a Lookup activity to read configurations, a ForEach activity to iterate over each API, a Copy activity to fetch JSON responses into ADLS Gen2, and a Snowflake stored procedure to dynamically create VARIANT tables and load the JSON data. Add new APIs by inserting rows into the metadata table -- no pipeline changes needed.

Introduction

A metadata-driven Azure Data Factory pipeline automates REST API ingestion into Snowflake by decoupling pipeline logic from source configuration. If you are new to ADF, start with our beginner guide to creating a pipeline in Azure Data Factory. Instead of building separate ADF pipelines for each API, this approach reads API configurations from a Snowflake metadata table, dynamically fetches data, and loads it into Snowflake VARIANT columns.

This metadata-driven ADF pipeline tutorial demonstrates how to build a reusable pipeline that automatically fetches data from multiple REST APIs, stores JSON responses in ADLS Gen2, creates Snowflake tables dynamically, and supports incremental loads with comprehensive execution logging.

By the end of this tutorial, you will have a reusable, production-ready pipeline architecture that supports incremental loads through time-window parameters, comprehensive execution logging, and full error handling with success and failure tracking. For another Fabric-based approach, explore our Azure SQL to Snowflake via Fabric and OneLake pipeline.

Architecture Overview

The end-to-end data flow follows this path:

REST API → ADLS Gen2 (JSON Files) → Snowflake (VARIANT Table)

The pipeline executes the following steps in sequence:

Read Metadata: API details (URL, endpoint, target table name) are stored in a Snowflake metadata table called API_METADATA.
ADF Lookup Activity: Reads all active API configurations from the metadata table.
ForEach Activity: Iterates over each API configuration returned by the Lookup.
REST Copy Activity: Fetches the API response using the Azure Data Factory REST connector.
Save to ADLS: Stores the raw JSON response as a file in Azure Data Lake Storage Gen2.
Dynamic Table Creation: Uses a Script Activity to create the Snowflake target table if it does not already exist.
Load into Snowflake: A Copy Activity loads the JSON file from ADLS into the Snowflake VARIANT column.
Log Execution Status: Script Activities log success or failure into an execution log table.
Time Window Control: A pipeline parameter called END_TS controls time windows for incremental loads.

Linked Services

The pipeline requires three linked services to connect ADF to external systems:

Linked Service	Type	Purpose
REST Linked Service	REST	Connects to REST APIs using base URL and authentication settings. The base URL is parameterized so it can be dynamically set per API from the metadata table.
Snowflake Linked Service	Snowflake	Connects to your Snowflake account using account name, database, warehouse, and authentication credentials. Used for both reading metadata and writing data.
Azure Blob Storage Linked Service	Azure Blob Storage	Connects to ADLS Gen2 via SAS URL or service principal. Used as the intermediate staging area for JSON files before loading into Snowflake.

Datasets

Four datasets are required to support the pipeline activities:

Dataset	Format	Parameter	Purpose
JSON Dataset	JSON	FILENAME	Points to the ADLS location where JSON responses are saved. The filename is dynamically set based on the API name and timestamp.
REST Dataset	REST	BaseURL	Represents the REST API source. The BaseURL parameter is populated from the metadata table for each API.
Snowflake Source Dataset	Snowflake	None	Used by the Lookup activity to read from the API_METADATA table.
Snowflake Sink Dataset	Snowflake	TABLENAME	Used by the Copy Activity to load JSON data into the dynamically specified Snowflake table.

Pipeline Design

Pipeline Parameter

The pipeline accepts one parameter:

END_TS (String): Defines the end timestamp for the data extraction window. This enables incremental loading by allowing the pipeline to filter API requests based on a time range.

Activity Flow

The pipeline is organized as follows:

1. Lookup Activity – Read API Metadata

The Lookup activity queries the API_METADATA table in Snowflake to retrieve all active API configurations. It uses a query like:

Lookup Query - Snowflake

SELECT API_ID, API_NAME, BASE_URL, ENDPOINT, TABLE_NAME
FROM API_METADATA
WHERE IS_ACTIVE = TRUE;

The output is a list of API records that the ForEach activity will iterate over.

2. ForEach Activity – Iterate Over Each API

The ForEach activity loops through each record returned by the Lookup. Inside the ForEach, the following activities are chained together:

2a. Script Activity (LOGSTART) – Log Start of Execution

Before fetching any data, a Script Activity inserts a log record into the EXECUTION_LOG table with a status of RUNNING:

Log Start - Snowflake Script

INSERT INTO EXECUTION_LOG (API_NAME, TABLE_NAME, STATUS, START_TIME)
VALUES (
  '@{item().API_NAME}',
  '@{item().TABLE_NAME}',
  'RUNNING',
  CURRENT_TIMESTAMP()
);

2b. Copy Activity (REST to JSON) – Fetch API and Save to ADLS

This Copy Activity uses the REST dataset as the source and the JSON dataset as the sink. The source URL is constructed dynamically by combining the BASE_URL and ENDPOINT fields from the metadata. The JSON file is saved to ADLS Gen2 with a filename based on the API name and current timestamp, ensuring each execution creates a unique file.

2c. Script Activity (TABLE CREATION) – Create Snowflake Table Dynamically

A Script Activity executes a CREATE TABLE IF NOT EXISTS statement in Snowflake. The table uses a single VARIANT column to store the entire JSON response:

Dynamic Table Creation - Snowflake Script

CREATE TABLE IF NOT EXISTS @{item().TABLE_NAME} (
  RAW_DATA VARIANT,
  LOAD_TIMESTAMP TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
  SOURCE_FILE STRING
);

This eliminates the need to manually create tables before running the pipeline. Each new API added to the metadata table will automatically have its target table created on first execution.

2d. Copy Activity (JSON to Snowflake) – Load JSON into VARIANT Column

This Copy Activity reads the JSON file from ADLS and loads it into the Snowflake table created in the previous step. The Snowflake sink dataset uses the TABLENAME parameter, which is dynamically set to @{item().TABLE_NAME}. The JSON data is mapped to the RAW_DATA VARIANT column, preserving the full JSON structure.

2e. Script Activity (UPDATE LOG – SUCCESS)

On the success path of the Copy Activity, this Script Activity updates the execution log to mark the run as completed:

Log Success - Snowflake Script

UPDATE EXECUTION_LOG
SET STATUS = 'SUCCESS',
    END_TIME = CURRENT_TIMESTAMP()
WHERE API_NAME = '@{item().API_NAME}'
  AND STATUS = 'RUNNING';

2f. Script Activity (UPDATE LOG – FAIL)

On the failure path (connected via the On Failure dependency), this Script Activity captures the error and marks the run as failed:

Log Failure - Snowflake Script

UPDATE EXECUTION_LOG
SET STATUS = 'FAILED',
    END_TIME = CURRENT_TIMESTAMP(),
    ERROR_MESSAGE = '@{activity('Copy_REST_to_JSON').error.message}'
WHERE API_NAME = '@{item().API_NAME}'
  AND STATUS = 'RUNNING';

Metadata Table Structure

The API_METADATA table stores all configuration needed for each API source. Adding a new API requires only inserting a new row into this table:

API_METADATA Table - Snowflake DDL

CREATE TABLE API_METADATA (
    API_ID INT,
    API_NAME STRING,
    BASE_URL STRING,
    ENDPOINT STRING,
    TABLE_NAME STRING,
    IS_ACTIVE BOOLEAN,
    LAST_RUN_TS TIMESTAMP
);

Column	Description
`API_ID`	Unique identifier for the API source
`API_NAME`	Descriptive name of the API (e.g., "WeatherAPI", "ExchangeRates")
`BASE_URL`	The root URL of the API (e.g., "https://api.example.com")
`ENDPOINT`	The specific API endpoint path (e.g., "/v1/data")
`TABLE_NAME`	Snowflake target table name where data will be loaded
`IS_ACTIVE`	Boolean flag to enable or disable an API without deleting the row
`LAST_RUN_TS`	Timestamp of the last successful execution for incremental tracking

Execution Log Table

The EXECUTION_LOG table captures the runtime status of every pipeline execution, providing complete observability:

EXECUTION_LOG Table - Snowflake DDL

CREATE TABLE EXECUTION_LOG (
    LOG_ID INT AUTOINCREMENT,
    API_NAME STRING,
    TABLE_NAME STRING,
    STATUS STRING,
    START_TIME TIMESTAMP,
    END_TIME TIMESTAMP,
    ERROR_MESSAGE STRING
);

Column	Description
`LOG_ID`	Auto-incrementing primary key for each log entry
`API_NAME`	Name of the API that was processed
`TABLE_NAME`	Target Snowflake table for this API
`STATUS`	Execution status: RUNNING, SUCCESS, or FAILED
`START_TIME`	Timestamp when the pipeline started processing this API
`END_TIME`	Timestamp when the processing completed or failed
`ERROR_MESSAGE`	Captured error message if the execution failed (NULL on success)

Key Benefits

Metadata-Driven Configuration: Add new REST APIs by simply inserting a row into the API_METADATA table. No pipeline modifications, no redeployment, and no code changes are required.
Full Error Handling with Execution Logging: Every pipeline run is tracked with start time, end time, status, and error messages. The On Failure path ensures that failures are captured and logged rather than silently ignored.
Dynamic Table Creation: The pipeline automatically creates Snowflake tables on first execution, eliminating the need for manual DDL scripts and reducing operational overhead.
VARIANT Column Preserves Full JSON Structure: By storing API responses in Snowflake VARIANT columns, you preserve the complete JSON hierarchy. This allows flexible downstream querying using Snowflake dot notation and LATERAL FLATTEN without defining a rigid schema upfront.
Time-Window Parameter for Incremental Loads: The END_TS pipeline parameter supports incremental data extraction, allowing the pipeline to fetch only new data since the last run.
Scalable and Maintainable: The ForEach pattern scales horizontally as you add more APIs. The separation of configuration from logic makes the pipeline easy to maintain and extend.

Conclusion

Building a metadata-driven Azure Data Factory pipeline for REST API ingestion into Snowflake provides a scalable, maintainable, and production-ready architecture for enterprise data integration. By storing API configurations in a metadata table, the pipeline eliminates hardcoded values and makes it trivial to onboard new data sources.

The combination of ADF's ForEach activity for parallel processing, ADLS Gen2 for intermediate JSON storage, and Snowflake's VARIANT data type for flexible semi-structured data handling creates a robust end-to-end data ingestion framework. The comprehensive execution logging with success and failure tracking ensures full observability and simplifies troubleshooting.

This pattern is particularly valuable for organizations that need to ingest data from dozens or hundreds of REST APIs, where maintaining individual pipelines for each source would be impractical. With this metadata-driven approach, your data engineering team can focus on data quality and analytics rather than pipeline maintenance.

Chandra Sekhar, Senior ETL Engineer

Chandra Sekhar is a Senior ETL Engineer at CelestInfo specializing in Talend, Azure Data Factory, and building high-performance data integration pipelines.

Frequently Asked Questions

Q: What is a metadata-driven pipeline in Azure Data Factory?

A metadata-driven pipeline reads its configuration (API URLs, table names, parameters) from a metadata table rather than hardcoding values. This allows you to add new data sources by inserting rows into the metadata table without modifying the pipeline. In this architecture, the API_METADATA table in Snowflake stores all API configurations, and the ADF Lookup activity reads these at runtime to drive the ForEach processing loop.

Q: Why use VARIANT columns in Snowflake for JSON data?

Snowflake's VARIANT data type natively stores semi-structured data like JSON. It preserves the full JSON structure and allows you to query nested fields using dot notation and LATERAL FLATTEN, eliminating the need to define a rigid schema upfront. This is especially useful for REST API responses where the JSON structure may vary between APIs or change over time without requiring table schema modifications.

Q: How does the pipeline handle API failures?

The pipeline uses ADF's built-in error handling with On Failure dependency paths. If the REST Copy Activity or the Snowflake load Copy Activity fails, execution flows to a failure Script Activity that logs the error message and marks the status as FAILED in the EXECUTION_LOG table. This allows operations teams to monitor failures, investigate root causes, and retry specific APIs without rerunning the entire pipeline.

Burning Questions
About CelestInfo

Simple answers to make things clear.

How accurate are the AI insights?+

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Can I integrate with my existing tools?+

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

What security measures do you have?+

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

How often are insights updated?+

Insights are updated in real-time as new data becomes available.

What kind of support do you offer?+

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

From REST API to Snowflake: Building a Metadata-Driven Azure Data Factory Pipeline

Introduction

Architecture Overview

REST API → ADLS Gen2 (JSON Files) → Snowflake (VARIANT Table)

Linked Services

Datasets

Pipeline Design

Pipeline Parameter

Activity Flow

1. Lookup Activity – Read API Metadata

2. ForEach Activity – Iterate Over Each API

2a. Script Activity (LOGSTART) – Log Start of Execution

2b. Copy Activity (REST to JSON) – Fetch API and Save to ADLS

2c. Script Activity (TABLE CREATION) – Create Snowflake Table Dynamically

2d. Copy Activity (JSON to Snowflake) – Load JSON into VARIANT Column

2e. Script Activity (UPDATE LOG – SUCCESS)

2f. Script Activity (UPDATE LOG – FAIL)

Metadata Table Structure

Execution Log Table

Key Benefits

Conclusion

Frequently Asked Questions

Q: What is a metadata-driven pipeline in Azure Data Factory?

Q: Why use VARIANT columns in Snowflake for JSON data?

Q: How does the pipeline handle API failures?

Related Articles

Burning Questions
About CelestInfo

Ready? Let's Talk!

From REST API to Snowflake: Building a Metadata-Driven Azure Data Factory Pipeline

Introduction

Architecture Overview

REST API → ADLS Gen2 (JSON Files) → Snowflake (VARIANT Table)

Linked Services

Datasets

Pipeline Design

Pipeline Parameter

Activity Flow

1. Lookup Activity – Read API Metadata

2. ForEach Activity – Iterate Over Each API

2a. Script Activity (LOGSTART) – Log Start of Execution

2b. Copy Activity (REST to JSON) – Fetch API and Save to ADLS

2c. Script Activity (TABLE CREATION) – Create Snowflake Table Dynamically

2d. Copy Activity (JSON to Snowflake) – Load JSON into VARIANT Column

2e. Script Activity (UPDATE LOG – SUCCESS)

2f. Script Activity (UPDATE LOG – FAIL)

Metadata Table Structure

Execution Log Table

Key Benefits

Conclusion

Frequently Asked Questions

Q: What is a metadata-driven pipeline in Azure Data Factory?

Q: Why use VARIANT columns in Snowflake for JSON data?

Q: How does the pipeline handle API failures?

Related Articles

Burning QuestionsAbout CelestInfo

Burning Questions
About CelestInfo