Name	Name	Last commit message	Last commit date
parent directory ..
jobs	jobs
.gitignore	.gitignore
Dockerfile-aws	Dockerfile-aws
Dockerfile-localstack	Dockerfile-localstack
Makefile	Makefile
README.md	README.md
docker-compose.yml	docker-compose.yml
emr-sample-access-policy.json.tfpl	emr-sample-access-policy.json.tfpl
emr-serverless-trust-policy.json	emr-serverless-trust-policy.json
entrypoint.py	entrypoint.py
main.tf	main.tf
pyproject.toml	pyproject.toml
start_job.sh	start_job.sh
stop-application.sh	stop-application.sh

Name

Last commit message

Last commit date

.gitignore

Dockerfile-aws

Dockerfile-localstack

Makefile

README.md

docker-compose.yml

emr-sample-access-policy.json.tfpl

emr-serverless-trust-policy.json

EMR Serverless with Python Dependencies

Key	Value
Services	EMR Serverless, S3, IAM
Integrations	Terraform, AWS CLI
Categories	Analytics; Big Data; Spark

Introduction

A demo application illustrating how to add Python dependencies to an EMR Serverless Spark job using LocalStack. This sample implements a workaround for mounting Python environments directly into the LocalStack container, enabling PySpark jobs with custom dependencies to run locally.

Prerequisites

A valid LocalStack for AWS license. Your license provides a LOCALSTACK_AUTH_TOKEN to activate LocalStack.
Docker
localstack CLI
awslocal CLI
Terraform ~> 1.9.1

Check prerequisites

make check

Installation

This initializes your Terraform workspaces:

make init

Build the Python dependencies for the Spark job. For LocalStack, we create a /pyspark_env folder that is mounted into the LocalStack container (rather than packaging it as a tarball like in AWS):

# For LocalStack: creates /pyspark_env folder
make build

# For AWS: creates pyspark_deps.tar.gz
make build-aws

Start LocalStack

export LOCALSTACK_AUTH_TOKEN=<your-auth-token>
make start

Deploy the Application

Creates the following resources via Terraform: IAM role, IAM policy, S3 bucket, and an EMR Serverless application.

# Deploy to LocalStack (starts LocalStack via docker-compose and applies Terraform)
LOCALSTACK_AUTH_TOKEN=$LOCALSTACK_AUTH_TOKEN make deploy

# Deploy to AWS
make deploy-aws

Run the application

We can finally run our Spark job. Notice the difference in start_job.sh between LocalStack and AWS: for AWS, spark.archives references environment/bin/python; for LocalStack, we rely on the volume-mounted container and use the absolute path /tmp/environment/bin/python.

# LocalStack
make run

# AWS
make run-aws

Destroy the application

# LocalStack
make destroy

# AWS
make destroy-aws

License

This code is available under the Apache 2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

EMR Serverless with Python Dependencies

Introduction

Prerequisites

Check prerequisites

Installation

Start LocalStack

Deploy the Application

Run the application

Destroy the application

License

FilesExpand file tree

emr-serverless-python-dependencies

Directory actions

More options

Directory actions

More options

Latest commit

History

emr-serverless-python-dependencies

Folders and files

parent directory

README.md

EMR Serverless with Python Dependencies

Introduction

Prerequisites

Check prerequisites

Installation

Start LocalStack

Deploy the Application

Run the application

Destroy the application

License