Azure Data Factory Self-Hosted Integration Runtime: Setup, Troubleshooting, and Best Practices

Q: When do I need a self-hosted integration runtime in ADF?

You need a self-hosted IR when connecting to on-premises databases (SQL Server, Oracle, MySQL), resources inside a private VNet without public endpoints, S3 buckets accessible only via VPN, or file shares on your corporate network. Any data source that isn't directly reachable from Azure's managed IR requires a self-hosted IR.

Q: What are the hardware requirements for self-hosted IR?

Minimum requirements are a Windows machine with .NET Framework 4.7.2 or later, 4 CPU cores, 8GB RAM, and 80GB disk space. For production workloads copying large volumes, Microsoft recommends 8+ cores, 16GB+ RAM, and SSD storage. The machine must have outbound internet access on ports 443, 1433, and 9350-9354.

Q: Why does my self-hosted IR show as offline?

The most common causes are firewall rules blocking outbound connections on port 443 to Azure Service Bus endpoints, expired service principal or key credentials, Windows service stopped due to updates or crashes, or proxy configuration issues. Check the IR Configuration Manager logs and verify outbound connectivity first.

Q: Does data flow through the self-hosted IR machine during copy operations?

Yes. The self-hosted IR processes and transfers data on the machine where it's installed. If you're copying 100GB from on-premises SQL Server to Azure, that 100GB flows through the SHIR machine's network. This means the machine's network bandwidth, CPU, and memory directly affect copy performance.

Chandra Sekhar

Published May 01, 2025 · Last updated Feb 2026

Celestinfo Software Solutions Pvt. Ltd. • May 01, 2025

Quick answer: The self-hosted integration runtime (SHIR) lets Azure Data Factory connect to on-premises databases, private VNet resources, and endpoints that aren't publicly accessible. Install it on a Windows machine with network access to your data source, register it with your ADF instance using an authentication key, and create linked services that reference the SHIR. All data flows through the SHIR machine -- plan your VM sizing accordingly.

Last updated: May 2025

When You Need a Self-Hosted IR

Azure Data Factory's managed (Azure-hosted) integration runtime handles connections to cloud services with public endpoints -- Azure SQL Database, Blob Storage, Snowflake, and similar. But plenty of enterprise data still lives behind firewalls. The self-hosted IR bridges that gap. You'll need one when connecting to:

On-premises databases: SQL Server, Oracle, MySQL, PostgreSQL, DB2 running in your data center
Private VNet resources: Azure SQL Managed Instance without a public endpoint, Azure VMs with private IPs only
S3 behind VPN: AWS S3 buckets accessible only through a VPN gateway or private link
File shares: SMB/NFS file shares on your corporate network
ODBC/JDBC sources: Any data source accessible only from within your network via ODBC or JDBC drivers

Installation Requirements

The SHIR is Windows-only. There's no Linux installer. Here's what the machine needs:

Requirement	Minimum	Recommended (Production)
OS	Windows Server 2016 or later	Windows Server 2019/2022
.NET Framework	4.7.2	4.8 or later
CPU	4 cores	8+ cores
RAM	8 GB	16+ GB
Disk	80 GB free	SSD recommended
Network	Outbound access on ports 443, 1433, 9350-9354	Dedicated NIC for data traffic

Don't install the SHIR on a machine running other heavy workloads. It competes for CPU and memory during copy operations, and a resource-starved SHIR is the number one cause of slow data transfers.

Step-by-Step: Installing SHIR on a Windows VM

Create the IR in ADF. Go to your Azure Data Factory portal → Manage → Integration runtimes → New → Self-Hosted. Give it a name and copy the authentication key (Key1 or Key2).
Download the installer. Microsoft provides a direct download link on the IR creation page. It's about 800MB. Alternatively, download from the Microsoft Integration Runtime download center.
Run the installer on your Windows VM. Accept defaults -- the service installs as "Microsoft Integration Runtime" and starts automatically.
Register with your ADF. The installer prompts for the authentication key. Paste the key from step 1. The IR connects to Azure and shows as "Running" in the ADF portal within 1-2 minutes.
Verify connectivity. In the IR Configuration Manager (runs on the SHIR machine), use "Diagnostics" to test connections to your on-premises data sources.

Connecting to On-Premises SQL Server

With the SHIR installed and registered, create a linked service in ADF. If you're new to ADF pipelines, our pipeline creation guide covers the fundamentals.

JSON -- Linked Service Configuration

{
  "name": "OnPremSqlServer",
  "type": "SqlServer",
  "typeProperties": {
    "connectionString": "Server=192.168.1.50;Database=SalesDB;Integrated Security=False;User ID=adf_reader;Password=***",
    "encryptedCredential": "..."
  },
  "connectVia": {
    "referenceName": "MyOnPremSHIR",
    "type": "IntegrationRuntimeReference"
  }
}

The connectVia section is the key -- it tells ADF to route this connection through your self-hosted IR instead of the Azure-managed IR. The SHIR machine must be able to reach the SQL Server on port 1433 (or whatever port your SQL Server listens on).

For credentials, you have two options: store the password in the linked service (encrypted by the SHIR's certificate) or reference an Azure Key Vault secret. Key Vault is strongly recommended for production -- it centralizes credential management and supports rotation.

High Availability: Multi-Node Setup

A single SHIR node is a single point of failure. If that VM goes down for patching, your pipelines fail. For production, set up 2-4 SHIR nodes sharing the same authentication key. ADF automatically load-balances copy activities across available nodes and fails over if one node goes offline.

To add a node: install the SHIR software on a second Windows VM and register it with the same authentication key as the first node. ADF recognizes it as a secondary node within minutes. All nodes must have identical network access to your data sources.

Limitation: multi-node SHIR doesn't support Data Flow activities. Data Flows always run on a single SHIR node. For high-availability Data Flows, you need separate SHIR instances.

Networking and Firewall Rules

The SHIR makes outbound connections only -- it never accepts inbound traffic. Required outbound rules:

Port	Destination	Purpose
443 (HTTPS)	*.servicebus.windows.net	Communication with ADF control plane
443 (HTTPS)	*.core.windows.net	Staging data in Azure Blob/ADLS
443 (HTTPS)	download.microsoft.com	Auto-update downloads
1433	Your SQL Server IP	Database connectivity (varies by source)
9350-9354	Azure Service Bus relay	Relay-based communication (legacy mode)

If your organization uses a proxy server, configure it in the SHIR Configuration Manager under Settings → Proxy. The SHIR supports HTTP proxies with basic authentication. NTLM proxies require additional configuration via the diahost.exe.config file.

Performance Tuning

Remember: all data flows through the SHIR machine's network. If you're copying 100GB from on-premises SQL Server to Azure Blob Storage, that 100GB travels from SQL Server to the SHIR VM, then from the SHIR VM up to Azure. The SHIR's network bandwidth is your bottleneck.

Increase concurrent job limit. By default, the SHIR limits concurrent copy activities based on the machine's core count. In the Configuration Manager, you can increase this under Settings → Concurrent Jobs. A machine with 8 cores can typically handle 16-24 concurrent jobs.
Reduce DIU count for small copies. ADF defaults to Auto DIU (Data Integration Units) for Copy Activity. For small tables (<1GB), set DIU to 4 -- higher values waste resources when the SHIR is the bottleneck.
Use staging for large copies. Enable staging via Azure Blob Storage in your Copy Activity. The SHIR compresses data before uploading, which can cut transfer time by 30-50% on large datasets.
Partition your queries. Instead of one SELECT * FROM large_table, configure physical partitioning in the Copy Activity source. ADF runs multiple parallel queries and copies partitions simultaneously.

Common Errors and Fixes

"The self-hosted integration runtime is offline"

This is the most common error. Causes, in order of likelihood:

Firewall blocking outbound port 443 to *.servicebus.windows.net. Test with Test-NetConnection -ComputerName yournamespace.servicebus.windows.net -Port 443.
Authentication key expired or rotated. Re-register the SHIR with the current key from ADF → Manage → Integration runtimes.
Windows service stopped. Check if "Microsoft Integration Runtime" service is running. Restart it.
Proxy misconfiguration. If you recently changed proxy settings, the SHIR may fail to connect. Verify in Configuration Manager.

"Unable to connect to remote server"

This typically means the SHIR can't reach your on-premises data source. Check:

Can the SHIR machine ping or telnet to the data source IP and port?
Is the database firewall allowing connections from the SHIR machine's IP?
Are the credentials in the linked service correct and not expired?
If using Windows Authentication, does the SHIR service account have database access?

"Connection timed out" during copy

Usually a network bandwidth issue. The SHIR machine's NIC can't push data fast enough. Remedies: upgrade to a faster NIC, enable compression in the Copy Activity, or reduce the table size with a WHERE clause filter.

Upgrade Management

Microsoft releases SHIR updates monthly. By default, auto-update is enabled and the SHIR updates itself during a maintenance window you configure (default: 2 AM - 6 AM). During updates, running activities complete but new activities queue until the update finishes (typically 5-10 minutes).

For tightly controlled environments, disable auto-update and manage upgrades manually. You can be up to 4 versions behind the latest before ADF starts warning you. More than 4 versions behind and some features may stop working. For teams building metadata-driven pipelines, see our REST API to Snowflake pipeline guide.

Key Takeaways

SHIR is required for any data source behind a firewall. On-premises databases, private VNets, VPN-only endpoints all need it.
All data flows through the SHIR machine. Size it for your data volumes -- a 4-core/8GB VM moving 500GB nightly will struggle.
Multi-node setup eliminates single points of failure. Use the same auth key on 2-4 VMs for automatic failover.
Port 443 outbound to *.servicebus.windows.net is non-negotiable. If your firewall blocks it, the SHIR can't talk to ADF.
"Integration runtime is offline" almost always means a firewall or credential issue. Check outbound connectivity first.
Enable staging for large copies. Compression during staging can cut transfer times by 30-50%.

Chandra Sekhar, Senior ETL Engineer

Chandra Sekhar is a Senior ETL Engineer at CelestInfo specializing in Talend, Azure Data Factory, and building high-performance data integration pipelines.

Frequently Asked Questions

Q: When do I need a self-hosted integration runtime in ADF?

You need a self-hosted IR when connecting to on-premises databases (SQL Server, Oracle, MySQL), resources inside a private VNet without public endpoints, S3 buckets accessible only via VPN, or file shares on your corporate network. Any data source that isn't directly reachable from Azure's managed integration runtime requires a self-hosted IR.

Q: What are the hardware requirements for self-hosted IR?

Minimum: Windows Server 2016+, .NET Framework 4.7.2, 4 CPU cores, 8GB RAM, 80GB disk. For production workloads copying large data volumes, recommend 8+ cores, 16GB+ RAM, and SSD storage. The machine must have outbound internet access on ports 443, 1433, and 9350-9354.

Q: Why does my self-hosted IR show as offline?

Common causes include firewall blocking outbound port 443 to Azure Service Bus endpoints, expired authentication keys, the Windows service being stopped, or proxy misconfiguration. Start by verifying outbound connectivity on port 443, then check the IR Configuration Manager logs on the SHIR machine for specific error details.

Q: Does data flow through the self-hosted IR machine during copy operations?

Yes. The SHIR processes and transfers data on the machine where it runs. Copying 100GB from on-premises SQL Server to Azure means that traffic flows through the SHIR machine's network interface. The machine's bandwidth, CPU, and memory directly affect copy performance, so size the VM accordingly.

Burning Questions
About CelestInfo

Simple answers to make things clear.

How accurate are the AI insights?+

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Can I integrate with my existing tools?+

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

What security measures do you have?+

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

How often are insights updated?+

Insights are updated in real-time as new data becomes available.

What kind of support do you offer?+

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Azure Data Factory Self-Hosted Integration Runtime: Setup, Troubleshooting, and Best Practices

When You Need a Self-Hosted IR

Installation Requirements

Step-by-Step: Installing SHIR on a Windows VM

Connecting to On-Premises SQL Server

High Availability: Multi-Node Setup

Networking and Firewall Rules

Performance Tuning

Common Errors and Fixes

"The self-hosted integration runtime is offline"

"Unable to connect to remote server"

"Connection timed out" during copy

Upgrade Management

Key Takeaways

Frequently Asked Questions

Q: When do I need a self-hosted integration runtime in ADF?

Q: What are the hardware requirements for self-hosted IR?

Q: Why does my self-hosted IR show as offline?

Q: Does data flow through the self-hosted IR machine during copy operations?

Related Articles

Burning Questions
About CelestInfo

Ready? Let's Talk!

Azure Data Factory Self-Hosted Integration Runtime: Setup, Troubleshooting, and Best Practices

When You Need a Self-Hosted IR

Installation Requirements

Step-by-Step: Installing SHIR on a Windows VM

Connecting to On-Premises SQL Server

High Availability: Multi-Node Setup

Networking and Firewall Rules

Performance Tuning

Common Errors and Fixes

"The self-hosted integration runtime is offline"

"Unable to connect to remote server"

"Connection timed out" during copy

Upgrade Management

Key Takeaways

Frequently Asked Questions

Q: When do I need a self-hosted integration runtime in ADF?

Q: What are the hardware requirements for self-hosted IR?

Q: Why does my self-hosted IR show as offline?

Q: Does data flow through the self-hosted IR machine during copy operations?

Related Articles

Burning QuestionsAbout CelestInfo

Burning Questions
About CelestInfo