AWS IAM for Data Engineers

AWS IAM for Data Engineers: Least-Privilege Policies That Actually Work

Celestinfo Software Solutions Pvt. Ltd. Apr 10, 2025

Quick answer: Every AWS data service (Glue, Lambda, Redshift, EMR) needs an IAM role with only the permissions it requires. That means specifying exact S3 bucket ARNs instead of s3:*, exact Glue database names instead of glue:*, and using conditions to restrict by region or IP. This article includes 3 copy-paste-ready IAM policy JSON examples for the most common data engineering scenarios.

Last updated: April 2025

IAM Fundamentals for Data Engineers

IAM isn't glamorous, but it's the reason your Glue job can read from S3, your Lambda can write to DynamoDB, and your Redshift cluster can access external data. Every AWS service interaction is an API call, and every API call is authorized (or denied) by IAM. Getting IAM right means your pipelines work reliably. Getting it wrong means either "Access Denied" errors at 2 AM or -- worse -- overly permissive policies that expose your data lake to every service in the account.


The four IAM concepts data engineers use daily:


Policy Example 1: Glue Job with S3 and Catalog Access


This policy grants a Glue ETL job read access to a source S3 bucket, write access to a target S3 bucket, and read/write access to specific Glue Catalog databases. For streamlined role creation, see our IAM role creation guide with AWS Policy Generator.


JSON -- Glue Job IAM Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadSourceBucket",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::raw-data-source-bucket",
        "arn:aws:s3:::raw-data-source-bucket/*"
      ]
    },
    {
      "Sid": "WriteTargetBucket",
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:DeleteObject"],
      "Resource": [
        "arn:aws:s3:::processed-data-target-bucket/*"
      ]
    },
    {
      "Sid": "GlueCatalogAccess",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase", "glue:GetTable",
        "glue:GetTables", "glue:GetPartitions",
        "glue:CreateTable", "glue:UpdateTable",
        "glue:BatchCreatePartition"
      ],
      "Resource": [
        "arn:aws:glue:us-east-1:123456789012:catalog",
        "arn:aws:glue:us-east-1:123456789012:database/analytics_db",
        "arn:aws:glue:us-east-1:123456789012:table/analytics_db/*"
      ]
    },
    {
      "Sid": "CloudWatchLogs",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup", "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws-glue/*"
    }
  ]
}

Notice: no s3:*, no Resource: "*". Each action is scoped to the exact bucket and Glue database this job needs. If the job changes to read from a new bucket, you add a specific resource ARN -- you don't widen the policy.


Policy Example 2: Lambda with DynamoDB and S3


A Lambda function that reads events from S3, processes them, and writes results to a DynamoDB table. The function also needs to log to CloudWatch.


JSON -- Lambda Execution Role Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadS3Events",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::event-data-bucket/incoming/*"
    },
    {
      "Sid": "WriteDynamoDB",
      "Effect": "Allow",
      "Action": [
        "dynamodb:PutItem", "dynamodb:UpdateItem",
        "dynamodb:BatchWriteItem"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/ProcessedEvents"
    },
    {
      "Sid": "BasicLambdaExecution",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup", "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:*"
    }
  ]
}

This Lambda can't read from any other S3 bucket, can't delete DynamoDB items, and can't invoke other Lambda functions. That's the point.


Policy Example 3: Cross-Account Redshift Access


Account A (data producer) has an S3 bucket with analytics data. Account B (data consumer) has a Redshift cluster that needs to COPY data from Account A's bucket.


JSON -- Cross-Account Role (in Account B)
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AssumeRoleInProducerAccount",
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::111111111111:role/S3DataShareRole"
    },
    {
      "Sid": "RedshiftCopyFromS3",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject", "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::producer-analytics-bucket",
        "arn:aws:s3:::producer-analytics-bucket/shared/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "us-east-1"
        }
      }
    }
  ]
}

Account A's bucket policy must also grant access to Account B's role ARN. The Condition block restricts access to a specific region -- a useful guardrail that prevents accidental cross-region data movement. For broader access control patterns, see our data access control strategies guide.


S3 Access Points for Multi-Team Patterns


When multiple teams need access to the same S3 bucket with different permissions, S3 Access Points simplify the bucket policy. Instead of one complex bucket policy with 15 principal ARNs and conditions, create an access point per team. Each access point has its own policy that scopes access to specific prefixes.


Example: the marketing team gets an access point that allows read-only access to s3://data-lake/marketing/*. The finance team gets an access point for s3://data-lake/finance/*. Neither team can see the other's data. This is cleaner than trying to manage it all in one bucket policy.


Policy Conditions: The Underused Power Feature


Conditions let you add guardrails beyond just actions and resources:



Common Mistakes



IAM Access Analyzer


IAM Access Analyzer is a free AWS tool that identifies overly permissive policies. Enable it in your account and it'll flag policies that grant access to external principals (other accounts, public access) or use wildcard resources where specific ARNs would be safer.


It also has a policy generation feature: point it at CloudTrail logs for a specific role, and it generates a least-privilege policy based on the actual API calls that role made over the past 90 days. This is invaluable for tightening permissions on existing roles without breaking anything. For S3 architecture that these policies protect, see our S3 data lake architecture guide.


Service Control Policies (SCPs)


If you manage multiple AWS accounts via AWS Organizations, SCPs set guardrails that apply across all accounts. Common data engineering SCPs include: deny resource creation outside approved regions, deny disabling of CloudTrail, deny public S3 bucket creation, and require encryption on all S3 objects. SCPs are the organizational safety net -- even if someone attaches AdministratorAccess to a role, the SCP Deny still blocks the restricted actions.


Key Takeaways


Chakri, Cloud Solutions Architect

Chakri is a Cloud Solutions Architect at CelestInfo with hands-on experience across AWS, Azure, GCP, and Snowflake cloud infrastructure.


Frequently Asked Questions

Q: What is the principle of least privilege in AWS IAM?

Least privilege means granting only the minimum permissions required. A Glue job reading from one S3 bucket should have s3:GetObject on that specific bucket ARN -- not s3:* on "Resource": "*". This limits the blast radius if credentials are compromised or code has bugs.

Q: How do I set up cross-account S3 access?

Three components: (1) a bucket policy on the source account granting access to the consuming account's role ARN, (2) an IAM role in the consuming account with STS AssumeRole trust, and (3) a policy on the consuming role with s3:GetObject scoped to the source bucket. The consuming service calls sts:AssumeRole to get temporary credentials.

Q: What is the maximum IAM policy size?

Inline policies: 2,048 characters. Managed policies: 6,144 characters. You can attach up to 10 managed policies per role, giving you an effective limit of ~61,440 characters. If you're hitting these limits, consider using IAM policy variables, S3 access points, or tag-based conditions to reduce policy verbosity.

Q: How does IAM policy evaluation work?

Explicit Deny always wins. AWS evaluates: Organization SCPs first, then resource-based policies, then identity-based policies. If any policy at any level explicitly denies the action, the request is denied regardless of Allow statements elsewhere. This makes Deny the most reliable way to enforce restrictions across an organization.

Related Articles

Burning Questions
About CelestInfo

Simple answers to make things clear.

Our AI insights are continuously trained on large datasets and validated by experts to ensure high accuracy.

Absolutely. CelestInfo supports integration with a wide range of industry-standard software and tools.

We implement enterprise-grade encryption, access controls, and regular audits to ensure your data is safe.

Insights are updated in real-time as new data becomes available.

We offer 24/7 support via chat, email, and dedicated account managers.

Still have questions?

Ready? Let's Talk!

Get expert insights and answers tailored to yourbusiness requirements and transformation.