TensorRT-LLM Backend

less than 1 minute read

Watched Faradawn Yang’s sharing about TRTLLM in EZ channel, which is really well articulated. I furthur watched his vLLM and SGLang sharing, also inspiring

0 Overall view of these three backend

These three engines are actually optimizing inference from different perspective Alt text

1 How TRTLLM stands out

Using vLLM load HF weight is like Python interpreator While TRTLLM compiles, like g++, the engine based on hardware type and execute later
TRTLLM is using Kernal Auto Tuning, which can search over difference matrix size for multiplication. The chanllege could be huge search space but can be mitigated by limiting batchsize
When not knowing the exactly batchsize The queue is the solution
It also comes with multiple other compiling options

2 CUDA Grapha Capture

I heard this term mentinoed multiple and still not sure what’s the exactly meaning Here is a simple comparsion and will dive into later Alt text

Twitter Facebook LinkedIn

You May Also Enjoy

Nanobot MCP 集成 - 连接外部工具的桥梁

March 09 2026

Nanobot 的核心定位是 MCP Host — 它不仅有内置工具，还能连接任意 MCP (Model Context Protocol) 服务器，动态扩展 Agent 能力。本文详解 MCP 在 nanobot 中的工作机制。

Nanobot 源码深度解析 - Agent 架构与运行机制

March 08 2026

Nanobot 是香港大学数据科学实验室 (HKU Data Science Lab) 发布的超轻量级 AI Agent 框架，核心代码仅约 4000 行。本文从源码角度深入分析其架构设计和运行机制。

Nanobot Agent Skills 实践

March 04 2026

最近在使用 nanobot 这个 AI agent 框架，记录一下 skill 系统的实践，包括创建翻译工具和本地网页服务。

Skills from the first principle

January 27 2026

This 10mins video explains Skills from the first principle