ECS Deployment Details
Add Dynamo example into ECS, couple of pitfalls
1 Cluster Setups
- GPU support only comes with EC2 cluster, so NO Fargate
- AMI has to be GPU compatable (Amazon Linux 2 GPU)
- Bottlerocket is an open-source Linux-based operating system meant for hosting containers. (somehow I can’t ssh into it)
- Setup Security Group properly for inbound traffic
- Add SSH key
- Set proper EBS volume size
2 Task defination Setups
- Choose EC2 so we can modify the network mode
- awsvpc network mode is required for tasks hosted on Fargate.
- Set bridge mode so the container can get outbound traffic
- Still not sure why awsvpc mode failed to send outbound traffic
- Task size, allocate resource for the task, and cause ASG(Auto Scaling Group to work)
- Container resource limit, set GPU=1 to get gpu nodes
- Add
ETCD_ENDPOINTS
andNATS_SERVER
env var. - Container configurations, set entrypoint as
sh -c
and add actually dynamo command as command.
3 Tips and Trouble shootings
- Multi-node deloyment can be achieved by different tasks. Like one task for etcd/nats.
- Multiple containers in one task can cause non-repeatable errors, so split the containers into multiple tasks
4 Dynamo Deployment of Hello World Example on AWS ECS
1. ECS Cluster Setup
- Go to AWS ECS console, Clusters tab and click on Create cluster
- Input the cluster name and choose AWS Fargate as the infrastructure. This option will create a serverless cluster to deploy containers
- Click on Create and a cluster will be deployed through cloudformation.
2. Task Definations Setup
We need to start 3 containers for the hello world example. A sample task defination JSON is attached.
- ETCD container
- Container name use
etcd
- Image URL is
bitnami/etcd:3.6.1
and Yes for Essential container - Container port
- Container name use
|Container port|Protocol|Port name| App protocol| |-|-|-|-| |2379|TCP|2379|HTTP| |2380|TCP|2379|HTTP|
- Environment variable key is
ALLOW_NONE_AUTHENTICATION
and value isYES
- NATS container
- Container name use
nats
- Image URL is
nats:2.11.4
and Yes for Essential container - Container port
|Container port|Protocol|Port name| App protocol| |-|-|-|-| |4222|TCP|4222|HTTP| |6222|TCP|6222|HTTP| |8222|TCP|8222|HTTP|
- Docker configuration, add
-js, --trace
in Command
- Dynamo hello world pipeline container
- Container name use
dynamo-hello-world-pipeline
- Add your Image URL and Yes for Essential container. It can be AWS ECR URL or Nvidia NGC URL. If using NGC URL, please also choose Private registry authentication and add your Secreate Manager ARN or name.
- Container port
- Container name use
Container port | Protocol | Port name | App protocol |
---|---|---|---|
8000 | TCP | 8000 | HTTP |
- Environment variables
|Key|Value type|Value| |-|-|-| |ETCD_ENDPOINTS|Value|http://localhost:2379| |NATS_SERVER|Value|http://localhost:4222|
- Docker configuration
Addsh,-c
in Entry point andcd src && uv run dynamo serve hello_world:Frontend
in Command
3. Task Deployment
You can create a service or directly run the task from the task defination
- Environment setup
- Choose the Fargate cluster for Existing cluster created in step 1.
- Networking setup
- Make sure you security group has inbound rule for port 22, so that you can ssh into the instance for debugging purpose
- Turn on public IP