How do I set up cross-account S3 access for data pipelines?

Cross-account S3 access requires three components: a bucket policy on the source account's S3 bucket granting access to the consuming account's role ARN, an IAM role in the consuming account with an STS AssumeRole trust policy, and an IAM policy attached to the consuming role with s3:GetObject permissions on the source bucket. The consuming service uses STS AssumeRole to get temporary credentials.

How does IAM policy evaluation order work?

AWS evaluates IAM policies in this order: explicit Deny always wins, then Organization SCPs, then resource-based policies, then identity-based policies. If any policy explicitly denies an action, it is denied regardless of any Allow statements elsewhere. This is why adding a Deny statement is the most reliable way to restrict access.

AWS IAM for Data Engineers: Least-Privilege Policies That Actually Work

Q: What is the maximum IAM policy size?

Inline IAM policies have a maximum size of 2,048 characters. Managed (customer) policies can be up to 6,144 characters. If your policy exceeds these limits, split it into multiple managed policies (up to 10 per role) or use IAM policy variables and wildcards to reduce verbosity.

Chakri

Published Apr 10, 2025 · Last updated Feb 2026

Data Engineer at CelestInfo. Specializing in cloud data platforms, ETL pipelines, and analytics solutions.

Celestinfo Software Solutions Pvt. Ltd.• Apr 10, 2025

Quick answer: Every AWS data service (Glue, Lambda, Redshift, EMR) needs an IAM role with only the permissions it requires. That means specifying exact S3 bucket ARNs instead of s3:*, exact Glue database names instead of glue:*, and using conditions to restrict by region or IP. This article includes 3 copy-paste-ready IAM policy JSON examples for the most common data engineering scenarios.

Last updated: April 2025

IAM Fundamentals for Data Engineers

IAM isn't glamorous, but it's the reason your Glue job can read from S3, your Lambda can write to DynamoDB, and your Redshift cluster can access external data. Every AWS service interaction is an API call, and every API call is authorized (or denied) by IAM. Getting IAM right means your pipelines work reliably. Getting it wrong means either "Access Denied" errors at 2 AM or -- worse -- overly permissive policies that expose your data lake to every service in the account.

The four IAM concepts data engineers use daily:

Users: Human identities with long-term credentials. Use for console access; avoid for service-to-service communication.
Roles: Temporary identity that services assume. Every Glue job, Lambda function, and ECS task should run under its own role.
Policies: JSON documents that define what actions are allowed or denied on which resources. Attached to users or roles.
STS AssumeRole: How a service in Account A gets temporary credentials to access resources in Account B. Critical for cross-account data access.

Policy Example 1: Glue Job with S3 and Catalog Access

This policy grants a Glue ETL job read access to a source S3 bucket, write access to a target S3 bucket, and read/write access to specific Glue Catalog databases. For streamlined role creation, see our IAM role creation guide with AWS Policy Generator.

JSON -- Glue Job IAM Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadSourceBucket",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::raw-data-source-bucket",
        "arn:aws:s3:::raw-data-source-bucket/*"
      ]
    },
    {
      "Sid": "WriteTargetBucket",
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:DeleteObject"],
      "Resource": [
        "arn:aws:s3:::processed-data-target-bucket/*"
      ]
    },
    {
      "Sid": "GlueCatalogAccess",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase", "glue:GetTable",
        "glue:GetTables", "glue:GetPartitions",
        "glue:CreateTable", "glue:UpdateTable",
        "glue:BatchCreatePartition"
      ],
      "Resource": [
        "arn:aws:glue:us-east-1:123456789012:catalog",
        "arn:aws:glue:us-east-1:123456789012:database/analytics_db",
        "arn:aws:glue:us-east-1:123456789012:table/analytics_db/*"
      ]
    },
    {
      "Sid": "CloudWatchLogs",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup", "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws-glue/*"
    }
  ]
}

Notice: no s3:*, no Resource: "*". Each action is scoped to the exact bucket and Glue database this job needs. If the job changes to read from a new bucket, you add a specific resource ARN -- you don't widen the policy.

Policy Example 2: Lambda with DynamoDB and S3

A Lambda function that reads events from S3, processes them, and writes results to a DynamoDB table. The function also needs to log to CloudWatch.

JSON -- Lambda Execution Role Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadS3Events",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::event-data-bucket/incoming/*"
    },
    {
      "Sid": "WriteDynamoDB",
      "Effect": "Allow",
      "Action": [
        "dynamodb:PutItem", "dynamodb:UpdateItem",
        "dynamodb:BatchWriteItem"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/ProcessedEvents"
    },
    {
      "Sid": "BasicLambdaExecution",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup", "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:*"
    }
  ]
}

This Lambda can't read from any other S3 bucket, can't delete DynamoDB items, and can't invoke other Lambda functions. That's the point.

Policy Example 3: Cross-Account Redshift Access

Account A (data producer) has an S3 bucket with analytics data. Account B (data consumer) has a Redshift cluster that needs to COPY data from Account A's bucket.

JSON -- Cross-Account Role (in Account B)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AssumeRoleInProducerAccount",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::111111111111:role/S3DataShareRole"
    },
    {
      "Sid": "RedshiftCopyFromS3",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject", "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::producer-analytics-bucket",
        "arn:aws:s3:::producer-analytics-bucket/shared/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "us-east-1"
        }
      }
    }
  ]
}

Account A's bucket policy must also grant access to Account B's role ARN. The Condition block restricts access to a specific region -- a useful guardrail that prevents accidental cross-region data movement. For broader access control patterns, see our data access control strategies guide.

S3 Access Points for Multi-Team Patterns

When multiple teams need access to the same S3 bucket with different permissions, S3 Access Points simplify the bucket policy. Instead of one complex bucket policy with 15 principal ARNs and conditions, create an access point per team. Each access point has its own policy that scopes access to specific prefixes.

Example: the marketing team gets an access point that allows read-only access to s3://data-lake/marketing/*. The finance team gets an access point for s3://data-lake/finance/*. Neither team can see the other's data. This is cleaner than trying to manage it all in one bucket policy.

Policy Conditions: The Underused Power Feature

Conditions let you add guardrails beyond just actions and resources:

aws:RequestedRegion: Restrict actions to specific AWS regions. Prevents accidental resource creation in us-west-2 when your team works in us-east-1.
aws:SourceIp: Allow API calls only from your corporate IP range. Useful for human users, not recommended for services.
s3:prefix: Limit S3 ListBucket to specific key prefixes. A team that needs /marketing/ doesn't need to list the entire bucket.
aws:PrincipalTag: Match against tags on the calling principal. Tag-based access control scales better than maintaining long lists of role ARNs.

Common Mistakes

Using "Resource": "*" everywhere. This grants the action on every resource in your account. It's the IAM equivalent of chmod 777. Always specify resource ARNs.
Attaching AdministratorAccess to services. We've seen Glue jobs running with AdministratorAccess because "it was easier during development." That Glue job can now delete your production databases, modify IAM roles, and spin up EC2 instances. Never do this.
Not using conditions. A policy that allows s3:PutObject on a bucket is fine. A policy that also restricts it to your region and requires encryption (s3:x-amz-server-side-encryption) is better.
Forgetting CloudWatch Logs permissions. Every service needs log access. Without it, your Glue jobs and Lambdas run but you can't see what they did. Include log permissions in every service role.
Ignoring the 6,144-character policy limit. Managed policies max out at 6,144 characters. If your policy is approaching this limit, split it into multiple managed policies (up to 10 per role) or use wildcards more strategically.

IAM Access Analyzer

IAM Access Analyzer is a free AWS tool that identifies overly permissive policies. Enable it in your account and it'll flag policies that grant access to external principals (other accounts, public access) or use wildcard resources where specific ARNs would be safer.

It also has a policy generation feature: point it at CloudTrail logs for a specific role, and it generates a least-privilege policy based on the actual API calls that role made over the past 90 days. This is invaluable for tightening permissions on existing roles without breaking anything. For S3 architecture that these policies protect, see our S3 data lake architecture guide.

Service Control Policies (SCPs)

If you manage multiple AWS accounts via AWS Organizations, SCPs set guardrails that apply across all accounts. Common data engineering SCPs include: deny resource creation outside approved regions, deny disabling of CloudTrail, deny public S3 bucket creation, and require encryption on all S3 objects. SCPs are the organizational safety net -- even if someone attaches AdministratorAccess to a role, the SCP Deny still blocks the restricted actions.

Key Takeaways

Every service gets its own role with only the permissions it needs. No shared roles across Glue jobs, Lambdas, and ECS tasks.
Specify resource ARNs, never "*". A Glue job that reads one bucket shouldn't have access to every bucket in the account.
Use conditions (region, encryption, IP, tags) to add guardrails beyond basic action/resource scoping.
Cross-account access = bucket policy + IAM role + STS AssumeRole. All three pieces are required.
IAM Access Analyzer generates least-privilege policies from CloudTrail. Use it to tighten existing roles without guessing.
Explicit Deny always wins. If you need to absolutely block an action, use a Deny statement -- no Allow anywhere can override it.
The 6,144-character limit is real. Plan for it with multiple managed policies or strategic wildcards.

Chakri, Intern

Chakri is an Intern at CelestInfo with hands-on experience across AWS, Azure, GCP, and Snowflake cloud infrastructure.

Frequently Asked Questions

Q: What is the principle of least privilege in AWS IAM?

Least privilege means granting only the minimum permissions required. A Glue job reading from one S3 bucket should have s3:GetObject on that specific bucket ARN -- not s3:* on "Resource": "*". This limits the blast radius if credentials are compromised or code has bugs.

Q: How do I set up cross-account S3 access?

Three components: (1) a bucket policy on the source account granting access to the consuming account's role ARN, (2) an IAM role in the consuming account with STS AssumeRole trust, and (3) a policy on the consuming role with s3:GetObject scoped to the source bucket. The consuming service calls sts:AssumeRole to get temporary credentials.

Q: What is the maximum IAM policy size?

Inline policies: 2,048 characters. Managed policies: 6,144 characters. You can attach up to 10 managed policies per role, giving you an effective limit of ~61,440 characters. If you're hitting these limits, consider using IAM policy variables, S3 access points, or tag-based conditions to reduce policy verbosity.

Q: How does IAM policy evaluation work?

Explicit Deny always wins. AWS evaluates: Organization SCPs first, then resource-based policies, then identity-based policies. If any policy at any level explicitly denies the action, the request is denied regardless of Allow statements elsewhere. This makes Deny the most reliable way to enforce restrictions across an organization.

Burning Questions
About CelestInfo

Simple answers to make things clear.

How accurate are the AI insights?+

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Can I integrate with my existing tools?+

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

What security measures do you have?+

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

How often are insights updated?+

Insights are updated in real-time as new data becomes available.

What kind of support do you offer?+

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Get Assistance

AWS IAM for Data Engineers: Least-Privilege Policies That Actually Work

IAM Fundamentals for Data Engineers

Policy Example 1: Glue Job with S3 and Catalog Access

Policy Example 2: Lambda with DynamoDB and S3

Policy Example 3: Cross-Account Redshift Access

S3 Access Points for Multi-Team Patterns

Policy Conditions: The Underused Power Feature

Common Mistakes

IAM Access Analyzer

Service Control Policies (SCPs)

Key Takeaways

Frequently Asked Questions

Q: What is the principle of least privilege in AWS IAM?

Q: How do I set up cross-account S3 access?

Q: What is the maximum IAM policy size?

Q: How does IAM policy evaluation work?

Related Articles

Burning Questions
About CelestInfo

Ready? Let's Talk!

AWS IAM for Data Engineers: Least-Privilege Policies That Actually Work

IAM Fundamentals for Data Engineers

Policy Example 1: Glue Job with S3 and Catalog Access

Policy Example 2: Lambda with DynamoDB and S3

Policy Example 3: Cross-Account Redshift Access

S3 Access Points for Multi-Team Patterns

Policy Conditions: The Underused Power Feature

Common Mistakes

IAM Access Analyzer

Service Control Policies (SCPs)

Key Takeaways

Frequently Asked Questions

Q: What is the principle of least privilege in AWS IAM?

Q: How do I set up cross-account S3 access?

Q: What is the maximum IAM policy size?

Q: How does IAM policy evaluation work?

Related Articles

Burning QuestionsAbout CelestInfo

Burning Questions
About CelestInfo