CLIP

less than 1 minute read

Dr Vlog gave a talk on CLIP to Math PhDs and summarized in a 50mins video.

0 Paper

OpenAI’s early works for early multimodality model training Alt text

1 How it works

The fundamention idea is contrastive learning between image and text contents, thus it gets the name CLIP( Contrastive Language-Image Pretraining) Alt text Can be used for zero-shot inferences, which is the main advantage. Alt text and it’s zero shot result is even higher than 1 shot, which is totally contract to human behavior. So this is telling us the real “learning” capability of human is still way stronger than current models. Alt text

Here is the dummy training code for CLIP Alt text It adapts well to images sets which is dramatically different from the original one Alt text

3 Applications

Apple’s MM1 model is based on CLIP Alt text and it also shows amazing few-shots learing capability Alt text

Tags:

Categories:

Updated: