LLM overview in 2024

less than 1 minute read

Why SFT is before RLHF
- RLHF need basic capability from the model.
2 steps in RLHF
If only learns from HF, then you knows answer to a specific question, but NOT to general questions.
- Learn Human preference by training a Reward Model to give rewards for answers
- Adjust network output based on the rewards
Alignment
SFT + RLHF = Alignment (to human)

Twitter Facebook LinkedIn

You May Also Enjoy

Stream Batch process

May 31 2025

One zhihu blog popped up on my frontpage and had some discussion about streaming batch process. So I followed couple of the passages and here are some high l...

CUDA

May 21 2025

1 Concepts thread thread block, consists of warps, executed on SM(Streaming Multiprocessor) warp, is a 32 thread block. A warp is executed physically ...

Slurm and Enroot

May 19 2025

Finally touching on Slurm system. First heard about during CGG time, and we had some brief discussing of using it for cluster jobs. But our own implemention ...

NVLink, InfiniBand and SpectrumX

May 13 2025

Summary from zhihu post, which some picture from here.