Nvidia GenAI Stack

less than 1 minute read

Nvidia NeMo

GenAI framework
on DGX Cloud/Kubernetes Clusters
AutoConfigurator
SFT and PEFT

Nvidia Triton

Inference Server
TensorRT-LLM example

Nvidia Merlin

Recommender system

Twitter Facebook LinkedIn

Stream Batch process

May 31 2025

One zhihu blog popped up on my frontpage and had some discussion about streaming batch process. So I followed couple of the passages and here are some high l...

CUDA

May 21 2025

1 Concepts thread thread block, consists of warps, executed on SM(Streaming Multiprocessor) warp, is a 32 thread block. A warp is executed physically ...

Slurm and Enroot

May 19 2025

Finally touching on Slurm system. First heard about during CGG time, and we had some brief discussing of using it for cluster jobs. But our own implemention ...

NVLink, InfiniBand and SpectrumX

May 13 2025

Summary from zhihu post, which some picture from here.