Knowledge Distillation

less than 1 minute read

Distillation was introduced by Hinton and Dean in 2015, another masterpiece from Google.

The fundamental idea is that training and inference have different requirements, so model compressoin, which is the only knowlege reference in this paper, by Dr. Rich Caruana.

The implementatoin would let student learn from teacher’s logits, on top of learning from the groundtruth label.

1 Softmax with Temperature

\(q_i=\frac{exp(z_i/T)}{\Sigma_jexp(z_j/T)}\)

The larger the $T$, the smaller are the differences

Alt text

2 Soft target and Hard target

The teacher’s logit are called soft target, and the true labels are hard target. Get totally lose by add two lose. Alt text

3 Result

Alt text

Tags:

Categories:

Updated: