How to Create a Data Pipeline in Azure Data Factory

How to Create a Data Pipeline in Azure Data Factory (ADF): A Complete Guide

Celestinfo Software Solutions Pvt. Ltd. Nov 21, 2024

Last updated: December 2024

Quick answer: To create a data pipeline in Azure Data Factory (ADF), set up a Storage Account and upload your source data, create Linked Services for both your source (Blob Storage) and sink (Azure SQL Database), define Datasets for input and output, then build a pipeline with a Copy Activity that maps source columns to destination columns. Trigger the pipeline and monitor execution in ADF’s Monitor tab.

Azure Data Factory (ADF) is Microsoft’s cloud-based data integration service for building automated data pipelines at scale. Whether you need to copy files from Blob Storage to Azure SQL Database or orchestrate complex ETL processes, ADF provides a visual, code-free interface to create, schedule, and monitor data pipelines. For more advanced ADF patterns, see our guide on building a metadata-driven ADF pipeline from REST API to Snowflake.

In this step-by-step guide, you will learn how to build a data pipeline in ADF from scratch, covering storage account setup, linked services, datasets, pipeline creation, and monitoring.

Pre-requisites

Before starting, make sure you have:

  • An Azure account
  • A Data Factory created
  • A CSV file stored in Azure Blob Storage
  • An Azure SQL Database with a table ready to receive data

The image below describes the copy activity in the pipeline.

This is the basic structure for creating any pipeline.

ADF pipeline copy activity structure diagram

We will perform the following steps in detail through hands-on practice.

Steps to Create the Pipeline

Step 1: Create a Storage Account

You need a storage account to hold your data files.

  • Log in to the Azure Portal.
  • Click Create a resource → search for Storage account → click Create.
  • Fill in the following details:
    • Resource Group: Choose or create a new one.
    • Storage account name: Example - mystorage123 (must be unique).
    • Region: Choose your region.
    • Performance: Standard is fine.
    • Redundancy: Locally-redundant (LRS) is enough for learning.
  • Click Review + Create → Create.
  • Once deployed, open your storage account.
  • Go to Containers → + Container → give a name (like inputdata) → set public access to Private → Create.

Step 2: Create a Data Factory

  • Log in to the Azure Portal.
  • Click Create a resource → search for Data Factory.
  • Azure Data Factory creation in Azure Portal

    Click the create button here,

  • Fill in the details (Name, Region, Resource Group, etc.).
  • Click Review + Create.

Step 3: Open ADF Studio

  • Once created, go to the resource → click Author & Monitor. This opens the ADF Studio.
  • ADF Studio Author and Monitor screen

Step 4: Create Linked Services


Linked Services are your connections.

  • Blob Storage: Go to Manage → Linked Services → New → Azure Blob Storage. Add connection details and test it.
  • Azure Blob Storage linked service configuration Azure Blob Storage linked service connection test
  • For the sink, also use blob storage or something like a SQL database.
  • Now create a new linked service same as the previous one for the sink.
  • Blob Storage: Go to Manage → Linked Services → New → Azure Blob Storage. Add connection details and test it.

Step 5: Create Datasets

Datasets describe your data.

  • Source Dataset (Blob): Choose DelimitedText → link to your Blob container → select your CSV file.
  • ADF source dataset configuration for DelimitedText ADF dataset Blob container selection ADF dataset CSV file selection
  • Sink Dataset (Blob): Choose DelimitedText → link to your Blob container → select your CSV file.(same as previous dataset).

Step 6: Build the Pipeline

  • Go to Author → Pipelines → New Pipeline.
  • 2. Drag a Copy Data activity to the canvas.
  • ADF pipeline canvas with Copy Data activity
  • In Source, choose your Blob dataset.
  • 4. In Sink, choose your Blob dataset.

Step 7: Test and Run

  • Click Validate and Debug to test the pipeline.
  • ADF pipeline validate and debug run results
  • Once it works, click Add Trigger → Trigger Now to run it.

Step 8: Monitor the Pipeline

  • Go to the Monitor tab.
  • You can see each run’s status, duration, and errors (if any).

Conclusion

Azure Data Factory (ADF) is a powerful and easy-to-use cloud service for building data pipelines. It allows you to connect different data sources and move data efficiently without complex coding. You can design, schedule, and monitor pipelines visually in a single platform. With ADF, automating data movement and transformation becomes fast and reliable. It supports both cloud and on-premises data sources seamlessly. Whether it’s daily data refresh or real-time integration, ADF handles it all. In short: Azure Data Factory is your one-stop solution for creating efficient, reusable, and automated data pipelines in the cloud. To extend your ADF pipelines with Fabric and Snowflake integration, see our guide on Azure SQL to Snowflake via Fabric and OneLake.

Frequently Asked Questions

What is Azure Data Factory used for?

Azure Data Factory (ADF) is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines for moving and transforming data across various sources and destinations.

Do I need coding skills to create a pipeline in ADF?

No, ADF provides a visual drag-and-drop interface in ADF Studio that lets you build pipelines without writing code. However, you can use expressions and custom scripts for advanced transformations.

What are Linked Services and Datasets in ADF?

Linked Services define the connection information to data stores (like Azure Blob Storage or SQL Database), while Datasets represent the structure and location of data within those stores that your pipeline reads from or writes to.

How can I monitor my ADF pipeline runs?

ADF provides a built-in Monitor tab where you can view pipeline run history, check status and duration, inspect activity-level details, and troubleshoot any errors that occurred during execution.

Chandra Sekhar, Senior ETL Engineer

Chandra Sekhar is a Senior ETL Engineer at CelestInfo specializing in Talend, Azure Data Factory, and building high-performance data integration pipelines.

Related Articles

Burning Questions
About CelestInfo

Simple answers to make things clear.

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

Insights are updated in real-time as new data becomes available.

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Ready? Let's Talk!

Get expert insights and answers tailored to your business requirements and transformation.