There is really no need to know the details of implementation from YOLO V1 to V9. But considering it’s the one model helped me quite a lot on AWS projects, I still enjoy watching this vlog and share some details.
Review some SSD which will be used in YOLO as well.
Essencially SSD is doing RCNN at different scale. Let’s start from basic R-CNN
It can be accelerated by doing CNN before Selective Search
Use to RPN replace Selective Search to get faster
Now doing RCNN at different scale
From network point of view, you will get
Here is a funny picture to demonstrate SSD can detect images at different scale
Mask-RCNN is introduced at last
2 YOLO v1
Comparing to 2 steps detection methods, YOLO, as the name suggest, you only look ONCE by doing detection and classification at the same time.
The loss from YOLO is combined with detected coordinates, IOU loss and classification loss.
3 YOLO v2
In v2, the improvements are from network to Darknnet-19 and etc.
Different datasets were used ( from Pascal VOC with 20 classes to ImageNet and COCO 9000 classes)
and use tree structure to identifiy 9000 classes
4 YOLO v3
Coordinate calculation is changed since v2. So instead of calculate from relative positions we directly get the true coordinates.
V3 is adding SSD similar architecture to detect objects at different scale