Azure Data Factory Self-Hosted Integration Runtime: Setup, Troubleshooting, and Best Practices
Quick answer: The self-hosted integration runtime (SHIR) lets Azure Data Factory connect to on-premises databases, private VNet resources, and endpoints that aren't publicly accessible. Install it on a Windows machine with network access to your data source, register it with your ADF instance using an authentication key, and create linked services that reference the SHIR. All data flows through the SHIR machine -- plan your VM sizing accordingly.
Last updated: May 2025
When You Need a Self-Hosted IR
Azure Data Factory's managed (Azure-hosted) integration runtime handles connections to cloud services with public endpoints -- Azure SQL Database, Blob Storage, Snowflake, and similar. But plenty of enterprise data still lives behind firewalls. The self-hosted IR bridges that gap. You'll need one when connecting to:
- On-premises databases: SQL Server, Oracle, MySQL, PostgreSQL, DB2 running in your data center
- Private VNet resources: Azure SQL Managed Instance without a public endpoint, Azure VMs with private IPs only
- S3 behind VPN: AWS S3 buckets accessible only through a VPN gateway or private link
- File shares: SMB/NFS file shares on your corporate network
- ODBC/JDBC sources: Any data source accessible only from within your network via ODBC or JDBC drivers
Installation Requirements
The SHIR is Windows-only. There's no Linux installer. Here's what the machine needs:
| Requirement | Minimum | Recommended (Production) |
|---|---|---|
| OS | Windows Server 2016 or later | Windows Server 2019/2022 |
| .NET Framework | 4.7.2 | 4.8 or later |
| CPU | 4 cores | 8+ cores |
| RAM | 8 GB | 16+ GB |
| Disk | 80 GB free | SSD recommended |
| Network | Outbound access on ports 443, 1433, 9350-9354 | Dedicated NIC for data traffic |
Don't install the SHIR on a machine running other heavy workloads. It competes for CPU and memory during copy operations, and a resource-starved SHIR is the number one cause of slow data transfers.
Step-by-Step: Installing SHIR on a Windows VM
- Create the IR in ADF. Go to your Azure Data Factory portal → Manage → Integration runtimes → New → Self-Hosted. Give it a name and copy the authentication key (Key1 or Key2).
- Download the installer. Microsoft provides a direct download link on the IR creation page. It's about 800MB. Alternatively, download from the Microsoft Integration Runtime download center.
- Run the installer on your Windows VM. Accept defaults -- the service installs as "Microsoft Integration Runtime" and starts automatically.
- Register with your ADF. The installer prompts for the authentication key. Paste the key from step 1. The IR connects to Azure and shows as "Running" in the ADF portal within 1-2 minutes.
- Verify connectivity. In the IR Configuration Manager (runs on the SHIR machine), use "Diagnostics" to test connections to your on-premises data sources.
Connecting to On-Premises SQL Server
With the SHIR installed and registered, create a linked service in ADF. If you're new to ADF pipelines, our pipeline creation guide covers the fundamentals.
{
"name": "OnPremSqlServer",
"type": "SqlServer",
"typeProperties": {
"connectionString": "Server=192.168.1.50;Database=SalesDB;Integrated Security=False;User ID=adf_reader;Password=***",
"encryptedCredential": "..."
},
"connectVia": {
"referenceName": "MyOnPremSHIR",
"type": "IntegrationRuntimeReference"
}
}
The connectVia section is the key -- it tells ADF to route this connection through your self-hosted IR instead of the Azure-managed IR. The SHIR machine must be able to reach the SQL Server on port 1433 (or whatever port your SQL Server listens on).
For credentials, you have two options: store the password in the linked service (encrypted by the SHIR's certificate) or reference an Azure Key Vault secret. Key Vault is strongly recommended for production -- it centralizes credential management and supports rotation.
High Availability: Multi-Node Setup
A single SHIR node is a single point of failure. If that VM goes down for patching, your pipelines fail. For production, set up 2-4 SHIR nodes sharing the same authentication key. ADF automatically load-balances copy activities across available nodes and fails over if one node goes offline.
To add a node: install the SHIR software on a second Windows VM and register it with the same authentication key as the first node. ADF recognizes it as a secondary node within minutes. All nodes must have identical network access to your data sources.
Limitation: multi-node SHIR doesn't support Data Flow activities. Data Flows always run on a single SHIR node. For high-availability Data Flows, you need separate SHIR instances.
Networking and Firewall Rules
The SHIR makes outbound connections only -- it never accepts inbound traffic. Required outbound rules:
| Port | Destination | Purpose |
|---|---|---|
| 443 (HTTPS) | *.servicebus.windows.net | Communication with ADF control plane |
| 443 (HTTPS) | *.core.windows.net | Staging data in Azure Blob/ADLS |
| 443 (HTTPS) | download.microsoft.com | Auto-update downloads |
| 1433 | Your SQL Server IP | Database connectivity (varies by source) |
| 9350-9354 | Azure Service Bus relay | Relay-based communication (legacy mode) |
If your organization uses a proxy server, configure it in the SHIR Configuration Manager under Settings → Proxy. The SHIR supports HTTP proxies with basic authentication. NTLM proxies require additional configuration via the diahost.exe.config file.
Performance Tuning
Remember: all data flows through the SHIR machine's network. If you're copying 100GB from on-premises SQL Server to Azure Blob Storage, that 100GB travels from SQL Server to the SHIR VM, then from the SHIR VM up to Azure. The SHIR's network bandwidth is your bottleneck.
- Increase concurrent job limit. By default, the SHIR limits concurrent copy activities based on the machine's core count. In the Configuration Manager, you can increase this under Settings → Concurrent Jobs. A machine with 8 cores can typically handle 16-24 concurrent jobs.
- Reduce DIU count for small copies. ADF defaults to Auto DIU (Data Integration Units) for Copy Activity. For small tables (<1GB), set DIU to 4 -- higher values waste resources when the SHIR is the bottleneck.
- Use staging for large copies. Enable staging via Azure Blob Storage in your Copy Activity. The SHIR compresses data before uploading, which can cut transfer time by 30-50% on large datasets.
- Partition your queries. Instead of one
SELECT * FROM large_table, configure physical partitioning in the Copy Activity source. ADF runs multiple parallel queries and copies partitions simultaneously.
Common Errors and Fixes
"The self-hosted integration runtime is offline"
This is the most common error. Causes, in order of likelihood:
- Firewall blocking outbound port 443 to *.servicebus.windows.net. Test with
Test-NetConnection -ComputerName yournamespace.servicebus.windows.net -Port 443. - Authentication key expired or rotated. Re-register the SHIR with the current key from ADF → Manage → Integration runtimes.
- Windows service stopped. Check if "Microsoft Integration Runtime" service is running. Restart it.
- Proxy misconfiguration. If you recently changed proxy settings, the SHIR may fail to connect. Verify in Configuration Manager.
"Unable to connect to remote server"
This typically means the SHIR can't reach your on-premises data source. Check:
- Can the SHIR machine ping or telnet to the data source IP and port?
- Is the database firewall allowing connections from the SHIR machine's IP?
- Are the credentials in the linked service correct and not expired?
- If using Windows Authentication, does the SHIR service account have database access?
"Connection timed out" during copy
Usually a network bandwidth issue. The SHIR machine's NIC can't push data fast enough. Remedies: upgrade to a faster NIC, enable compression in the Copy Activity, or reduce the table size with a WHERE clause filter.
Upgrade Management
Microsoft releases SHIR updates monthly. By default, auto-update is enabled and the SHIR updates itself during a maintenance window you configure (default: 2 AM - 6 AM). During updates, running activities complete but new activities queue until the update finishes (typically 5-10 minutes).
For tightly controlled environments, disable auto-update and manage upgrades manually. You can be up to 4 versions behind the latest before ADF starts warning you. More than 4 versions behind and some features may stop working. For teams building metadata-driven pipelines, see our REST API to Snowflake pipeline guide.
Key Takeaways
- SHIR is required for any data source behind a firewall. On-premises databases, private VNets, VPN-only endpoints all need it.
- All data flows through the SHIR machine. Size it for your data volumes -- a 4-core/8GB VM moving 500GB nightly will struggle.
- Multi-node setup eliminates single points of failure. Use the same auth key on 2-4 VMs for automatic failover.
- Port 443 outbound to *.servicebus.windows.net is non-negotiable. If your firewall blocks it, the SHIR can't talk to ADF.
- "Integration runtime is offline" almost always means a firewall or credential issue. Check outbound connectivity first.
- Enable staging for large copies. Compression during staging can cut transfer times by 30-50%.
Frequently Asked Questions
Q: When do I need a self-hosted integration runtime in ADF?
You need a self-hosted IR when connecting to on-premises databases (SQL Server, Oracle, MySQL), resources inside a private VNet without public endpoints, S3 buckets accessible only via VPN, or file shares on your corporate network. Any data source that isn't directly reachable from Azure's managed integration runtime requires a self-hosted IR.
Q: What are the hardware requirements for self-hosted IR?
Minimum: Windows Server 2016+, .NET Framework 4.7.2, 4 CPU cores, 8GB RAM, 80GB disk. For production workloads copying large data volumes, recommend 8+ cores, 16GB+ RAM, and SSD storage. The machine must have outbound internet access on ports 443, 1433, and 9350-9354.
Q: Why does my self-hosted IR show as offline?
Common causes include firewall blocking outbound port 443 to Azure Service Bus endpoints, expired authentication keys, the Windows service being stopped, or proxy misconfiguration. Start by verifying outbound connectivity on port 443, then check the IR Configuration Manager logs on the SHIR machine for specific error details.
Q: Does data flow through the self-hosted IR machine during copy operations?
Yes. The SHIR processes and transfers data on the machine where it runs. Copying 100GB from on-premises SQL Server to Azure means that traffic flows through the SHIR machine's network interface. The machine's bandwidth, CPU, and memory directly affect copy performance, so size the VM accordingly.
