Eagle 1/2/3 + HASS

1 minute read

Speculative Decoding w Eagles

0 Medusa review

Zhihu explains Medusa building the tree attention has $\Sigma_{i=1}^N\Pi_{j=1}^i{C_i}$ branches ($N$ head and $C_i$ tokens for each head). So pruning is critial for Medusa, like 4 heads [10, 10, 9, 4], the path will drop from 4610 to 64.

1 Eagle 1

Instead of token level decoding, Medusa uses feature level decoding. Eagle use both token and feature features and also added causal This figure from Eagle paper shows the differences. Alt text Eagle builds a static draft tree and have multiple rounds of forward through the 1-layer transformer. Alt text

2 Eagle 2

Eagle2 modifies the static tree to dynamic. Alt text

  • Expand phase: Remove nodes with prob. less than a threshold
  • Rerank phase: Reranking all the left nodes and keep top K. Alt text

3 HASS

HArmonized Speculative Sampling (HASS) is to improve the gap of features used in training and in inferences. Zhihu explains it and introduce another work CORAL. Alt text So the solution is multi-step training, which is sending features from draft model to training.

4 Eagle 3

Eagle3 paper is inspired from HASS and zhihu summarized it as following improvments:

  1. Used 3 layers of features
  2. Multi-step training similar to HASS
  3. Remove Smooth L1 for feature, which was used to bring gaps of train/infer feature gasp. Not needed w Multi-step training. Alt text

Tags:

Categories:

Updated: