250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[CV] Object Detection 1stage detectors ๋ณธ๋ฌธ

๐Ÿ–ฅ๏ธ Computer Vision/Object Detection

[CV] Object Detection 1stage detectors

ใ…… ใ…œ ใ…” ใ…‡ 2023. 5. 22. 00:46
728x90

 

2 stage detector๋Š” localizatin๊ณผ classification์„ ๋ชจ๋‘ ํ•ด์•ผ ํ•™์Šต์ด ์ง„ํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ์†๋„๊ฐ€ ๋งค์šฐ ๋Š๋ฆฌ๋‹ค๋Š” ๋‹จ์  ๋•Œ๋ฌธ์— 1 stage detector๊ฐ€ ๋“ฑ์žฅํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. 

 

1 stage detectors
- localization, classification ๋™์‹œ์— ์ง„ํ–‰
- ์ „์ฒด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํŠน์ง• ์ถ”์ถœ, ๊ฐ์ฒด ๊ฒ€์ถœ์ด ์ด๋ฃจ์–ด์ง -> ๊ฐ„๋‹จํ•˜๊ณ  ์‰ฌ์šด ๋””์ž์ธ
- ์†๋„๊ฐ€ ๋งค์šฐ ๋น ๋ฆ„ (Real-time detection)
- ์˜์—ญ์„ ์ถ”์ถœํ•˜์ง€ ์•Š๊ณ  ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ์ฒด์— ๋Œ€ํ•œ ๋งฅ๋ฝ์  ์ดํ•ด๊ฐ€ ๋†’์Œ (Background error๊ฐ€ ์ ์Œ)

 

 

YOLO

  • YOLO v1 : ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์˜ Bbox์™€ classification ๋™์‹œ์— ์˜ˆ์ธกํ•˜๋Š” 1 stage detector ๋“ฑ์žฅ
  • YOLO v2 : ๋น ๋ฅด๊ณ  ๊ฐ•๋ ฅํ•˜๊ณ  ๋” ์ข‹๊ฒŒ, 3๊ฐ€์ง€ ์ธก๋ฉด์—์„œ model ํ–ฅ์ƒ
  • YOLO v3 : multi-scale feature maps ์‚ฌ์šฉ
  • YOLO v4 : ์ตœ์‹  ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ  ์‚ฌ์šฉ, (BOF : Bag of Freebies, BOS: Bag of Specials)
  • YOLO v5: ํฌ๊ธฐ๋ณ„๋กœ ๋ชจ๋ธ ๊ตฌ์„ฑ (Small, Medium, Large, Xlarge)

- Region proposal ๋‹จ๊ณ„ X
- ์ „์ฒด ์ด๋ฏธ์ง€์—์„œ bbox ์˜ˆ์ธก๊ณผ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์ผ์„ ๋™์‹œ์— ์ง„ํ–‰
- ์ด๋ฏธ์ง€, ๋ฌผ์ฒด๋ฅผ ์ „์ฒด์ ์œผ๋กœ ๊ด€์ฐฐํ•˜์—ฌ ์ถ”๋ก ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋งฅ๋ฝ์  ์ดํ•ด๊ฐ€ ๋†’์•„์ง

 

Network

- GoogleNet์˜ ๋ณ€ํ˜• ๋ฒ„์ „์œผ๋กœ 24๊ฐœ์˜ Conv layer๋กœ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ  2๊ฐœ์˜ fc layer๋กœ box์˜ ์ขŒํ‘œ๊ฐ’๊ณผ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•จ

1. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ 7 X 7 ๊ทธ๋ฆฌ๋“œ ์˜์—ญ์œผ๋กœ ๋‚˜๋ˆ”
2. ๊ฐ ๊ทธ๋ฆฌ๋“œ ์˜์—ญ๋งˆ๋‹ค B๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ bbox์™€ confidence score๋ฅผ ๊ณ„์‚ฐํ•จ (B=2)
3. ๊ฐ ๊ทธ๋ฆฌ๋“œ ์˜์—ญ๋งˆ๋‹ค C๊ฐœ์˜ class์— ๋Œ€ํ•ด ํ•ด๋‹น ํด๋ž˜์Šค์ผ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•จ (C=20)

 

1. ํ•˜๋‚˜์˜ ๊ทธ๋ฆฌ๋“œ๋Š” ์ด 30๊ฐœ์˜ ์ฑ„๋„์„ ๊ฐ€์ง
2. ์ด 30๊ฐœ์˜ ์ฑ„๋„ ์ค‘ ๋งจ ์•ž 5๊ฐœ์˜ ์ฑ„๋„์€ ์ฒซ๋ฒˆ์งธ bbox์˜ ์ •๋ณด๋ฅผ ๋‹ด๋Š”๋‹ค (x,y,w,h, confidence score)
3. 6~10๋ฒˆ์งธ ์ฑ„๋„์€ ๋‘๋ฒˆ์งธ bbox์˜ ์ •๋ณด๋ฅผ ๋‹ด์Œ (grid cell๋งˆ๋‹ค 2๊ฐœ์˜ bbox๊ฐ€ ์กด์žฌํ•จ)
4. ๋‚˜๋จธ์ง€ 20๊ฐœ์˜ ์ฑ„๋„์€ class์˜ score๋ฅผ ์˜๋ฏธํ•จ (yolo์—์„œ ์‚ฌ์šฉ๋œํŒŒ์Šค์นผ ๋ฐ์ดํ„ฐ์…‹์˜ ํด๋ž˜์Šค๋Š” ์ด 20๊ฐœ)
5. ๊ฐ๊ฐ์˜ class score๋Š” ๊ฐ bbox ๋‘๊ฐœ์˜ ํ™•๋ฅ ์„ ๊ฐ๊ฐ ์˜ˆ์ธกํ•œ๋‹ค. 

 

 

=> ์ด 7*7*2 = 98 ๊ฐœ์˜ ์˜ˆ์ธก cell์ด ๋งŒ๋“ค์–ด ์ง„๋‹ค. ์ด 98๊ฐœ์˜ cell์„ threshold๊ฐ’์œผ๋กœ ํ•„ํ„ฐ๋ง ํ›„ descendingํ•˜๊ณ , NMS ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. 

 

Loss

 

์žฅ์ 

  • Faster R-CNN์— ๋น„ํ•ด 6๋ฐฐ ๋น ๋ฆ„
  • ๋‹ค๋ฅธ real-time detector์— ๋น„ํ•ด 2๋ฐฐ ๋†’์€ ์ •ํ™•๋„
  • ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ๋ณด๊ธฐ ๋•Œ๋ฌธ์— ํด๋ž˜์Šค์™€ ์‚ฌ์ง„์— ๋Œ€ํ•œ ๋งฅ๋ฝ์ ์ธ ์ •๋ณด๋ฅผ ๊ฐ€์ง
  • ๋ฌผ์ฒด์˜ ์ผ๋ฐ˜ํ™”๋œ ํ‘œํ˜„์„ ํ•™์Šตํ•จ -> ์ƒˆ๋กœ์šด ๋„๋ฉ”์ธ์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„

 

๋‹จ์ 

  • 7X7 ๊ทธ๋ฆฌ๋“œ ์˜์—ญ์œผ๋กœ ๋‚˜๋‰˜์–ด bbox predcition -> ๊ทธ๋ฆฌ๋“œ๋ณด๋‹ค ์ž‘์€ ํฌ๊ธฐ์˜ object ๊ฒ€์ถœ X
  • ์‹ ๊ฒฝ๋ง์„ ํ†ต๊ณผํ•˜๋ฉฐ ๋งˆ์ง€๋ง‰ feature ์‚ฌ์šฉ -> ์ •ํ™•๋„ ํ•˜๋ฝ

 

YOLOv2

- ์ •ํ™•๋„, ์†๋„, ๋” ๋งŽ์€ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋†’์ž„
- Batch Normalization : mAP 2% ์„ฑ๋Šฅ ํ–ฅ์ƒ
- High Resolution Classifier
   - YOLOv1: 224x224 ์ด๋ฏธ์ง€๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ VGG๋ฅผ 448x448 Detection Task์— ์ ์šฉ
   - YOLOV2: 448x448 ์ด๋ฏธ์ง€๋กœ ์ƒˆ๋กญ๊ฒŒ finetuning
   -> mAP 4% ์ •๋„ ์„ฑ๋Šฅ ํ–ฅ์ƒ
- Convolution with anchor boxes : FCN layer ์ œ๊ฑฐ, Anchor box๋ฅผ ๋„์ž…, K means clusters on COCO datasets
(5๊ฐœ์˜ anchor box)
- ์ขŒํ‘œ ๊ฐ’ ๋Œ€์‹  offset ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋‹จ์ˆœํ•˜๊ณ  ํ•™์Šตํ•˜๊ธฐ ์‰ฌ์›€ : mAP 5% ์ •๋„์˜ ์„ฑ๋Šฅํ–ฅ์ƒ
- Fine-grained features
   - ํฌ๊ธฐ๊ฐ€ ์ž‘์€ feature map์€ low level ์ •๋ณด๊ฐ€ ๋ถ€์กฑํ•จ
   - Early feature map์€ ์ž‘์€ low level ์ •๋ณด๋ฅผ ๋‹ด์Œ
  -> Early feature map์„ late feature map์— ํ•ฉ์ณ์ฃผ๋Š” layer ๋„์ž…
  - 26x26 feature map์„ ๋ถ„ํ• ํ›„์— ๊ฒฐํ•ฉ์„ ํ•จ
- Multi-scale training : ๋‹ค์–‘ํ•œ ์ž…๋ ฅ ์‚ฌ์ด์ฆˆ์˜ ์ด๋ฏธ์ง€ ์‚ฌ์šฉ

- ImageNet (classification) ๋ฐ์ดํ„ฐ์…‹๊ณผ COCO (detection) ๋ฐ์ดํ„ฐ์…‹์„ ํ•จ๊ป˜ ์‚ฌ์šฉ
- WordTree ๊ตฌ์„ฑ : 9418๊ฐœ์˜ ๋ฒ”์ฃผ

 

 

 

 

YOLO v3

- Backbone์„ Darknet53์œผ๋กœ ๋ณ€๊ฒฝํ•จ
- Skip connection์„ ์ ์šฉํ•จ
- Max pooling์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ๋Œ€์‹  Convolution stride 2๋ฅผ ์‚ฌ์šฉํ•จ
- ResNet-101, ResNet-142์™€ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ์ง€๋‹ˆ๊ณ  FPS๊ฐ€ ๋†’๋‹ค๋Š”๊ฒŒ ์žฅ์ 

 

 

 

RetinaNet

1-Stage Detector๋Š” RPN์ด ์—†๊ณ  ๊ทธ๋ฆฌ๋“œ๋ณ„๋กœ bounding box๋ฅผ ๋ฌด์กฐ๊ฑด ์˜ˆ์ธกํ•˜๊ธฐ ๋•Œ๋ฌธ์— background๋ฅผ ์žก๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋” ๋งŽ์ด ์ƒ๊ฒจ์„œ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

Class ๋ถˆ๊ท ํ˜•
- Positive sample(๊ฐ์ฒด) < negative sample(๋ฐฐ๊ฒฝ)
Anchor box ๋Œ€๋ถ€๋ถ„ ๋ฐฐ๊ฒฝ์ด๋‹ค.
- 2 Stage Detector์˜ ๊ฒฝ์šฐ region proposal์—์„œ background sample์„ ์ œ๊ฑฐํ•œ๋‹ค.
Positive/Negative sample ์ˆ˜๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ์œ ์ง€ํ•œ๋‹ค.
Cross Entropy์— Scaling Facor๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๋‹ค.
์ƒˆ๋กœ์šด Loss function์„ ์ œ์‹œํ–ˆ๋‹ค : CSE + scaling factor
์‰ฌ์šด ์˜ˆ์ œ์— ์ ์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๊ณ  ์–ด๋ ค์šด ์˜ˆ์ œ์— ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค.

Focal Loss
1) Object detection์—์„œ back ground์™€ class์˜ imbalance ์กฐ์ •
2) Object detection ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ class imbalance๊ฐ€ ์‹ฌํ•œ dataset์„ ํ•™์Šตํ•  ๋•Œ ๋ชจ๋‘ ํ™œ์šฉ ๊ฐ€๋Šฅํ•จ 

 

 

 

 

 

SSD

SSD YOLO
300X300 input 448X448 input
2๊ฐœ์˜ fc layer 1X1 Conv ์‚ฌ์šฉํ•ด FC layer X

 

 

SSD
- Extra Conv layer์— ๋‚˜์˜จ feature map๋“ค์€ ๋ชจ๋‘ detection ์ˆ˜ํ–‰
- FC layer ๋Œ€์‹  Conv layer ์‚ฌ์šฉํ•ด ์†๋„ ํ–ฅ์ƒ
- Default box (anchor box)  ์‚ฌ์šฉํ•ด ์„œ๋กœ ๋‹ค๋ฅธ scale๊ณผ ๋น„์œจ์„ ๊ฐ€์ง„ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐ๋œ box ์‚ฌ์šฉ

 

 

Network

VGG 16 (Backbone) + Extra Conv Layers

input : 300X300

 

Multi scale feature maps

 

anchor box๋“ค์˜ ์ค‘์‹ฌ์ ์˜ ์ขŒํ‘œ์™€ class์˜ ๊ฐœ์ˆ˜ + 1 (background) => ์ด 25๊ฐœ์˜ ์˜ˆ์ธก๊ฐ’์ด ๋‚˜์˜ด

 

NBbox ๋Š” 0.2~0.9๊นŒ์ง€ ์„ ํ˜•์ ์œผ๋กœ scale์„ ๋Š˜๋ ค๋‚˜๊ฐ€๋ฉฐ ์ด 6๊ฐœ์˜ default box๋ฅผ ํ™œ์šฉํ•จ

 

 

 

Default Box

feature map์˜ ๊ฐ cell๋งˆ๋‹ค ์„œ๋กœ ๋‹ค๋ฅธ scale, ๋น„์œจ์„ ๊ฐ€์ง„ ๋ฏธ๋ฆฌ ์ •ํ•ด์ง„ box๋ฅผ ์ƒ์„ฑํ•จ
Faster R-CNN์˜ anchor box์™€ ์œ ์‚ฌํ•จ

์ตœ์ข…์ ์œผ๋กœ ์ด 8732๊ฐœ์˜ default box๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค. 

Training

- Hard Negative Mining์„ ์ˆ˜ํ–‰
- NMS๋ฅผ ์ˆ˜ํ–‰

728x90
Comments