Nvidia GenAI Stack

less than 1 minute read

Nvidia NeMo

GenAI framework
on DGX Cloud/Kubernetes Clusters
AutoConfigurator
SFT and PEFT

Nvidia Triton

Inference Server
TensorRT-LLM example

Nvidia Merlin

Recommender system

Twitter Facebook LinkedIn

Preprocessing/Processor in vLLM

June 30 2025

Here are preprocessing related code in vLLM

Weight loading in vLLM

June 27 2025

Fixed a weight loading error. It was reporting There is no module or parameter named sth at weight loading. It took me couple of days to root cause this issu...

ECS Deployment Details

June 24 2025

Add Dynamo example into ECS, couple of pitfalls

AWS ECS

June 18 2025

I finally got access to an AWS account again and volunteernly to test deploying Dynamo on AWS ECS, which I only barely touched when Fargate was released.