Cutting Claims Processing Time by 73% with Azure Data Factory and Snowflake

Healthcare / Insurance Claims · 2,400 Employees 11 Weeks

The Challenge

A mid-size health insurance provider was processing 180,000 claims per month through a mainframe-era batch system that ran overnight. Claims adjudication took 14 business days on average - and that was on a good month. When volumes spiked during open enrollment, the backlog ballooned past 30 days. Providers in the network were openly threatening to leave.

The operations team couldn't even answer a basic question: "How many claims are in the queue right now?" That number came from a weekly Excel report assembled manually by a senior analyst, and it was always 3-5 days stale by the time leadership saw it. The mainframe's DB2 tables weren't queryable by anyone outside of two COBOL developers, both of whom had been with the company since 1998. The data team had zero real-time visibility, and every decision was based on gut instinct and spreadsheets that didn't agree with each other.

The Solution

We built an Azure Data Factory pipeline that extracts claims data from the mainframe DB2 system via a self-hosted Integration Runtime. This was non-negotiable - the mainframe sat behind a firewall with no public endpoint, so we deployed the IR on an on-prem VM with connectivity to both DB2 and Azure. ADF pulls claims records on a 15-minute schedule, stages them in ADLS Gen2 as Parquet files (partitioned by claim_date and claim_type), and loads them into Snowflake.

Here's where it gets interesting. We didn't try to replace the mainframe's adjudication engine - that's a 3-year project nobody had budget for. Instead, we reimplemented the 47 core claims adjudication rules as a series of Snowflake stored procedures and dbt models that run a parallel validation pass. Every claim gets scored by our Snowflake pipeline before the mainframe processes it overnight. If our rules flag a claim as likely to fail adjudication (missing provider NPI, invalid diagnosis code, duplicate submission), it gets routed to a human reviewer immediately rather than sitting in the mainframe queue for 14 days only to bounce back.

On the reporting side, we put a Power BI dashboard in front of operations managers showing real-time queue depth, average processing time by claim type, top rejection reasons, and bottleneck stages. They went from checking a stale spreadsheet once a week to refreshing a live dashboard whenever they wanted.

Results

14d → 3.8d Average claims processing time

73% Faster adjudication

Real-time Queue visibility (was weekly)

$1.6M Saved in provider network retention

Tech Stack

Azure Data Factory Snowflake dbt Power BI DB2 (Mainframe) ADLS Gen2

What We Learned

The biggest lesson here: you don't have to replace the mainframe to get massive value from modern data infrastructure. The "parallel validation" pattern - running claims through a Snowflake-based rules engine alongside the mainframe - delivered 73% faster processing without touching a single line of COBOL. That made the business case for the eventual mainframe migration much easier to sell, because leadership could see the value of the modern stack running next to the legacy one.

The self-hosted Integration Runtime was the trickiest part of the whole project. DB2 JDBC connectivity from Azure requires very specific driver versions, and the connection pooling defaults in ADF's IR aren't tuned for mainframe workloads. We had to set maxPoolSize=5 and connectionTimeout=120 in the linked service configuration to keep things stable. If you're connecting ADF to DB2, start there - the defaults will time out on you.

One gotcha we didn't expect: the mainframe's claim status codes didn't match the documentation. The DB2 table had 23 distinct status values; the internal wiki listed 14. We spent two days mapping the undocumented ones by cross-referencing COBOL source code. Plan for that kind of data archeology when you're working with systems that predate version control.

Processing Claims Slower Than You Should Be?

Whether it's a mainframe bottleneck, a reporting gap, or a batch system that can't keep up - we've been here before. Let's talk about what's realistic for your setup.

Start a Conversation