ADF Self-Hosted Integration Runtime

Azure Data Factory Self-Hosted Integration Runtime: Setup, Troubleshooting, and Best Practices

Celestinfo Software Solutions Pvt. Ltd. May 01, 2025

Quick answer: The self-hosted integration runtime (SHIR) lets Azure Data Factory connect to on-premises databases, private VNet resources, and endpoints that aren't publicly accessible. Install it on a Windows machine with network access to your data source, register it with your ADF instance using an authentication key, and create linked services that reference the SHIR. All data flows through the SHIR machine -- plan your VM sizing accordingly.

Last updated: May 2025

When You Need a Self-Hosted IR

Azure Data Factory's managed (Azure-hosted) integration runtime handles connections to cloud services with public endpoints -- Azure SQL Database, Blob Storage, Snowflake, and similar. But plenty of enterprise data still lives behind firewalls. The self-hosted IR bridges that gap. You'll need one when connecting to:


Installation Requirements


The SHIR is Windows-only. There's no Linux installer. Here's what the machine needs:


RequirementMinimumRecommended (Production)
OSWindows Server 2016 or laterWindows Server 2019/2022
.NET Framework4.7.24.8 or later
CPU4 cores8+ cores
RAM8 GB16+ GB
Disk80 GB freeSSD recommended
NetworkOutbound access on ports 443, 1433, 9350-9354Dedicated NIC for data traffic

Don't install the SHIR on a machine running other heavy workloads. It competes for CPU and memory during copy operations, and a resource-starved SHIR is the number one cause of slow data transfers.


Step-by-Step: Installing SHIR on a Windows VM


  1. Create the IR in ADF. Go to your Azure Data Factory portal → Manage → Integration runtimes → New → Self-Hosted. Give it a name and copy the authentication key (Key1 or Key2).
  2. Download the installer. Microsoft provides a direct download link on the IR creation page. It's about 800MB. Alternatively, download from the Microsoft Integration Runtime download center.
  3. Run the installer on your Windows VM. Accept defaults -- the service installs as "Microsoft Integration Runtime" and starts automatically.
  4. Register with your ADF. The installer prompts for the authentication key. Paste the key from step 1. The IR connects to Azure and shows as "Running" in the ADF portal within 1-2 minutes.
  5. Verify connectivity. In the IR Configuration Manager (runs on the SHIR machine), use "Diagnostics" to test connections to your on-premises data sources.

Connecting to On-Premises SQL Server


With the SHIR installed and registered, create a linked service in ADF. If you're new to ADF pipelines, our pipeline creation guide covers the fundamentals.


JSON -- Linked Service Configuration
{
  "name": "OnPremSqlServer",
  "type": "SqlServer",
  "typeProperties": {
    "connectionString": "Server=192.168.1.50;Database=SalesDB;Integrated Security=False;User ID=adf_reader;Password=***",
    "encryptedCredential": "..."
  },
  "connectVia": {
    "referenceName": "MyOnPremSHIR",
    "type": "IntegrationRuntimeReference"
  }
}

The connectVia section is the key -- it tells ADF to route this connection through your self-hosted IR instead of the Azure-managed IR. The SHIR machine must be able to reach the SQL Server on port 1433 (or whatever port your SQL Server listens on).


For credentials, you have two options: store the password in the linked service (encrypted by the SHIR's certificate) or reference an Azure Key Vault secret. Key Vault is strongly recommended for production -- it centralizes credential management and supports rotation.


High Availability: Multi-Node Setup


A single SHIR node is a single point of failure. If that VM goes down for patching, your pipelines fail. For production, set up 2-4 SHIR nodes sharing the same authentication key. ADF automatically load-balances copy activities across available nodes and fails over if one node goes offline.


To add a node: install the SHIR software on a second Windows VM and register it with the same authentication key as the first node. ADF recognizes it as a secondary node within minutes. All nodes must have identical network access to your data sources.


Limitation: multi-node SHIR doesn't support Data Flow activities. Data Flows always run on a single SHIR node. For high-availability Data Flows, you need separate SHIR instances.


Networking and Firewall Rules


The SHIR makes outbound connections only -- it never accepts inbound traffic. Required outbound rules:


PortDestinationPurpose
443 (HTTPS)*.servicebus.windows.netCommunication with ADF control plane
443 (HTTPS)*.core.windows.netStaging data in Azure Blob/ADLS
443 (HTTPS)download.microsoft.comAuto-update downloads
1433Your SQL Server IPDatabase connectivity (varies by source)
9350-9354Azure Service Bus relayRelay-based communication (legacy mode)

If your organization uses a proxy server, configure it in the SHIR Configuration Manager under Settings → Proxy. The SHIR supports HTTP proxies with basic authentication. NTLM proxies require additional configuration via the diahost.exe.config file.


Performance Tuning


Remember: all data flows through the SHIR machine's network. If you're copying 100GB from on-premises SQL Server to Azure Blob Storage, that 100GB travels from SQL Server to the SHIR VM, then from the SHIR VM up to Azure. The SHIR's network bandwidth is your bottleneck.



Common Errors and Fixes


"The self-hosted integration runtime is offline"

This is the most common error. Causes, in order of likelihood:

  1. Firewall blocking outbound port 443 to *.servicebus.windows.net. Test with Test-NetConnection -ComputerName yournamespace.servicebus.windows.net -Port 443.
  2. Authentication key expired or rotated. Re-register the SHIR with the current key from ADF → Manage → Integration runtimes.
  3. Windows service stopped. Check if "Microsoft Integration Runtime" service is running. Restart it.
  4. Proxy misconfiguration. If you recently changed proxy settings, the SHIR may fail to connect. Verify in Configuration Manager.

"Unable to connect to remote server"

This typically means the SHIR can't reach your on-premises data source. Check:


"Connection timed out" during copy

Usually a network bandwidth issue. The SHIR machine's NIC can't push data fast enough. Remedies: upgrade to a faster NIC, enable compression in the Copy Activity, or reduce the table size with a WHERE clause filter.


Upgrade Management


Microsoft releases SHIR updates monthly. By default, auto-update is enabled and the SHIR updates itself during a maintenance window you configure (default: 2 AM - 6 AM). During updates, running activities complete but new activities queue until the update finishes (typically 5-10 minutes).


For tightly controlled environments, disable auto-update and manage upgrades manually. You can be up to 4 versions behind the latest before ADF starts warning you. More than 4 versions behind and some features may stop working. For teams building metadata-driven pipelines, see our REST API to Snowflake pipeline guide.


Key Takeaways


Chandra Sekhar, Senior ETL Engineer

Chandra Sekhar is a Senior ETL Engineer at CelestInfo specializing in Talend, Azure Data Factory, and building high-performance data integration pipelines.


Frequently Asked Questions

Q: When do I need a self-hosted integration runtime in ADF?

You need a self-hosted IR when connecting to on-premises databases (SQL Server, Oracle, MySQL), resources inside a private VNet without public endpoints, S3 buckets accessible only via VPN, or file shares on your corporate network. Any data source that isn't directly reachable from Azure's managed integration runtime requires a self-hosted IR.

Q: What are the hardware requirements for self-hosted IR?

Minimum: Windows Server 2016+, .NET Framework 4.7.2, 4 CPU cores, 8GB RAM, 80GB disk. For production workloads copying large data volumes, recommend 8+ cores, 16GB+ RAM, and SSD storage. The machine must have outbound internet access on ports 443, 1433, and 9350-9354.

Q: Why does my self-hosted IR show as offline?

Common causes include firewall blocking outbound port 443 to Azure Service Bus endpoints, expired authentication keys, the Windows service being stopped, or proxy misconfiguration. Start by verifying outbound connectivity on port 443, then check the IR Configuration Manager logs on the SHIR machine for specific error details.

Q: Does data flow through the self-hosted IR machine during copy operations?

Yes. The SHIR processes and transfers data on the machine where it runs. Copying 100GB from on-premises SQL Server to Azure means that traffic flows through the SHIR machine's network interface. The machine's bandwidth, CPU, and memory directly affect copy performance, so size the VM accordingly.

Related Articles

Burning Questions
About CelestInfo

Simple answers to make things clear.

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

Insights are updated in real-time as new data becomes available.

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Ready? Let's Talk!

Get expert insights and answers tailored to yourbusiness requirements and transformation.