LLM overview in 2024

less than 1 minute read

  1. Why SFT is before RLHF
    • RLHF need basic capability from the model. Alt text
  2. 2 steps in RLHF
    If only learns from HF, then you knows answer to a specific question, but NOT to general questions.
    • Learn Human preference by training a Reward Model to give rewards for answers Alt text
    • Adjust network output based on the rewards Alt text
  3. Alignment
    SFT + RLHF = Alignment (to human) Alt text

Tags:

Categories:

Updated: