Skip links

  • Skip to primary navigation
  • Skip to content
  • Skip to footer
Kyle's Blog
  • Posts
  • Categories
  • Tags

    Nvidia GenAI Stack

    less than 1 minute read

    On this page

    • Nvidia NeMo
    • Nvidia Triton
    • Nvidia Merlin

    Nvidia NeMo

    • GenAI framework
    • on DGX Cloud/Kubernetes Clusters
    • AutoConfigurator
    • SFT and PEFT

    Nvidia Triton

    • Inference Server
    • TensorRT-LLM example

    Nvidia Merlin

    • Recommender system

    Tags: GPU

    Categories: Study

    Updated: February 25, 2024

    Twitter Facebook LinkedIn
    Previous Next

    You May Also Enjoy

    K9S and Kubeadm

    April 30 2025

    After playing K8S for couple of weeks and started to deploy K8S and debugging network issues

    Sliding Window Attention

    April 27 2025

    I was debugging a sliding window attention bug and it was fixed by this PR. I helped on the review and get it merged.

    Eagle 1/2/3 + HASS

    April 26 2025

    Speculative Decoding w Eagles 0 Medusa review Zhihu explains Medusa building the tree attention has $\Sigma_{i=1}^N\Pi_{j=1}^i{C_i}$ branches ($N$ head and $...

    LLM Scores Pass@k to Perplexity

    April 24 2025

    Some details about LLM measuring scores

    • GitHub
    • LinkedIn
    • Feed
    © 2025 Kyle's Blog. Powered by Jekyll & Minimal Mistakes.