250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[CV] Object Detection Neck ์ •๋ฆฌ ๋ณธ๋ฌธ

๐Ÿ–ฅ๏ธ Computer Vision/Object Detection

[CV] Object Detection Neck ์ •๋ฆฌ

ใ…… ใ…œ ใ…” ใ…‡ 2023. 5. 21. 17:41
728x90
๋ณธ ํฌ์ŠคํŒ…์€ Naver Boostcamp AI Tech 5๊ธฐ
Object Detection ๊ฐ•์˜ ์ž๋ฃŒ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. 

 

Neck ๋“ฑ์žฅ ๋ฐฐ๊ฒฝ

๊ธฐ์กด์˜ RPN์€ backbone network๋ฅผ ํ†ต๊ณผํ•œ ๋งˆ์ง€๋ง‰ feature map๋งŒ ์‚ฌ์šฉํ•˜์—ฌ RPN์„ ํ†ตํ•œ ROI๋ฅผ ์ถ”์ถœํ•˜์˜€๋‹ค. 

-> ์ค‘๊ฐ„์— ์žˆ๋Š” feature๋“ค์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด Neck ๋“ฑ์žฅ

- feature map ์€ low level(ํฐ feature map)์€ ์ž‘์€ ๊ฐ์ฒด๋ฅผ, high level (์ž‘์€ feature map)์€ ํฐ ๊ฐ์ฒด๋ฅผ ์ถ”์ถœํ•จ
 (high level์—์„œ๋Š” semantic ์ •๋ณด๊ฐ€ ํ’๋ถ€ํ•˜์ง€๋งŒ localization ์ •๋ณด๊ฐ€ ๋ถ€์กฑ, low level์—์„œ๋Š” localization ์ •๋ณด๊ฐ€ ํ’๋ถ€ํ•˜์ง€๋งŒ sementic ์ •๋ณด๊ฐ€ ๋ถ€์กฑ)
- Back bone์„ ํ†ต๊ณผํ•œ feature map์—์„œ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ๊ฐ์ฒด๋ฅผ ์˜ˆ์ธกํ•ด์•ผ ํ•˜๋Š”๋ฐ ๋งˆ์ง€๋ง‰ layer ๋งŒ ํ™œ์šฉํ•˜๋ฉด ์ž‘์€ ๊ฐ์ฒด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์–ด๋ ค์›€ 
- ๊ธฐ์กด์˜ RPN์€ backbone network๋ฅผ ํ†ต๊ณผํ•œ ๋งˆ์ง€๋ง‰ feature map๋งŒ ์‚ฌ์šฉํ•˜์—ฌ RPN์„ ํ†ตํ•œ ROI๋ฅผ ์ถ”์ถœํ•˜์˜€๋‹ค.  
- ์ค‘๊ฐ„์— ์žˆ๋Š” feature๋“ค์„ ์ž˜ ์„ž์–ด์„œ ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด Neck ๋“ฑ์žฅ

 

 

 

FPN (Feature Pyramid Network)
์ด๋ฏธ์ง€ ์ž์ฒด์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ์กฐ์ ˆํ•ด feature ์ถ”์ถœ

 

1. High level์—์„œ low level์— semantic ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•ด์•ผ ํ•˜๋ฏ€๋กœ top-down path way๋ฅผ ์ถ”๊ฐ€

-> pyramid ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด high - leow๋กœ ์ˆœ์ฐจ์ ์œผ๋กœ ์ •๋ณด ์ „๋‹ฌ

 

 

2. Lateral connection๋ฐฉ์‹์œผ๋กœ ๋‘๊ฐœ์˜ feature merge

 

 

3. ๋‘ ๊ฐœ์˜ feature์˜ shape์ด ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— bottom-up ๊ณผ์ •์—์„œ ๋‚˜์˜จ feature map์€ 1 * 1 conv layer๋ฅผ ํ†ต๊ณผํ•ด channel size๋ฅผ ๋Š˜๋ ค ์คŒ

 

 

4. Top-down์—์„œ ๋‚˜์˜จ feature map์€ upsampling์„ ํ†ตํ•ด width์™€ height์„ ๋†’์—ฌ์คŒ (Nearest Neighbor Upsampling ๊ธฐ๋ฒ• ์‚ฌ์šฉ)

 

 

 

์ „์ฒด Pipeline
Backbone : ResNet ์‚ฌ์šฉ

ResNet์—๋Š” ์ด 4๊ฐœ์˜ stage๊ฐ€ ์žˆ์œผ๋ฉฐ, pooling์œผ๋กœ w, h๊ฐ€ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์–ด๋“ฆ

 

์ตœ์ข…์ ์œผ๋กœ ์ถœ๋ ฅ๋œ RoI๋ฅผ ๋Œ€์ƒ์œผ๋กœ NMS๋ฅผ ์ˆ˜ํ–‰ํ•จ -> 1000๊ฐœ์˜ ROI๋กœ ๋งŒ๋“ฆ
๊ฐ ROI๊ฐ€ ์–ด๋–ค feature map์—์„œ ๋‚˜์˜จ ๊ฑด์ง€ ์•Œ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— Mapping๊ณผ์ •์ด ํ•„์š”ํ•จ
๊ณ„์‚ฐ๋œ k๊ฐ’์ด ์ž‘์„ ์ˆ˜๋ก low-level์ž„์„ ์˜๋ฏธ

 

 

 

PANet (Path Aggregation Network)

FPN์—์„œ๋Š” ๋„ˆ๋ฌด ๋งŽ์€ feature ๊ฐ€ conv layer๋ฅผ ํ†ต๊ณผํ•ด ์ •๋ณด๊ฐ€ top์œผ๋กœ ์ž˜ ๊ฐˆ ์ˆ˜ ์žˆ๋Š”์ง€ ์˜๋ฌธ ๋•Œ๋ฌธ์— ๋“ฑ์žฅ

 

 

-> FPN์—์„œ์˜ Top Down ์ดํ›„ ๋‹ค์‹œ Bottom up ์ˆ˜ํ–‰ํ•จ

 

 ๊ฒฝ๊ณ„์— ์žˆ๋Š” RoI์— ๋Œ€ํ•ด์„œ๋Š” K๊ฐ’ ๊ณ„์‚ฐ์ด ์ œ๋Œ€๋กœ ๋Œ€์‘์ด ์•ˆ๋จ 

  • FPN : ํ•˜๋‚˜์˜ stage์—์„œ ROI projection & Pooling
  • Adapted Feature Pooling : ๋ชจ๋“  stage์—์„œ ROI projection & Pooling

 

 

 

DetectoRS

: ๊ฐ™์€ ๊ฒƒ์„ ๋‘ ๋ฒˆ์”ฉ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ณด๋ฉด ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€๋Š” ์ง€ ํ™•์ธํ•œ ๋ชจ๋ธ

Recursive Feature Pyramid (RFP)
backbone์„ ํ†ตํ•ด FPN ๋œ ๊ฒฐ๊ณผ๋ฌผ์„ ๋‹ค์‹œ Backbone์— ํ•™์Šต์‹œํ‚ด

ํ•™์Šต์†๋„๊ฐ€ ๋งค์šฐ ๋Š๋ฆผ (ํ•œ๋ฒˆ Neck์„ ํ†ต๊ณผํ•œ stage๋“ค์€ ASPP๋ฅผ ๊ฑฐ์ณ์„œ ๋‹ค์‹œ ํ•œ๋ฒˆ feature map๋“ค๊ณผ ํ•ฉ์ณ์ง)

 

ASPP
Receptive field๋ฅผ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํ•˜๋‚˜์˜ feature map์—์„œ pooling์„ ์ง„ํ–‰ํ•  ๋•Œ diliation rate๋ฅผ ๋ณ€ํ™”์‹œ์ผœ๊ฐ€๋ฉฐ receptive field๋ฅผ ํ‚ค์›Œ๋‚˜๊ฐ€๋ฉฐ pooling

 

 

 

BiFPN (Bi-directional Feature Pyramid)

๊ธฐ์กด์˜ PANet์„ ๊ฐ„์†Œํ™” ํ•œ ๋ชจ๋ธ๋กœ ํšจ์œจ์„ฑ์„ ์œ„ํ•ด feature map์ด ํ•œ ๊ณณ์—์„œ๋งŒ ์˜ค๋Š” ๋…ธ๋“œ๋“ค์„ ์ œ๊ฑฐ

 

- FPN๊ณผ ๊ฐ™์ด ๋‹จ์ˆœํ•˜๊ฒŒ summation์„ ํ•˜๋Š” ๊ฒŒ ์•„๋‹Œ ๊ฐ feature ๋ณ„๋กœ ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ ํ›„ summation
- ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ๋Š” ๊ฑฐ์˜ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š์Œ
- feature ๋ณ„ ๊ฐ€์ค‘์น˜๋ฅผ ํ†ตํ•ด ์ค‘์š”ํ•œ feature๋ฅผ ๊ฐ•์กฐํ•ด ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋จ

 

 

NASFPN

๊ธฐ์กด FPN, PANet์˜ ๋ฌธ์ œ์ ์ธ top-down, bottom-up ์˜ ์ •ํ•ด์ง„ pathway๋กœ feature map์„ ํ•ฉ์น˜๋Š” ๊ฒƒ์„ ํ•ด๊ฒฐํ•œ ๋ชจ๋ธ

NAS (neural architecture search)๋ฅผ ํ†ตํ•ด์„œ ์ตœ์ ์˜ FPN architecture๋ฅผ ์ฐพ์œผ๋ ค๊ณ  ์‹œ๋„

 

๋‹จ์ 
- COCO dataset, ResNet ๊ธฐ์ค€์œผ๋กœ ์ฐพ์€ architecture๋กœ ๋ฒ”์šฉ์ ์ด์ง€ ๋ชปํ•จ (๋ฐ์ดํ„ฐ์…‹์ด๋‚˜ backbone์ด ๋ฐ”๋€Œ๋ฉด ๋‹ค์‹œ ํ•™์Šตํ•ด์•ผํ•จ)
- High cost

 

 

 

 

AugFPN

high level feature๋Š” low level๋กœ ๊ฐ€๊ธฐ ์œ„ํ•ด์„œ channel์˜ ์ˆ˜๋ฅผ ์ค„์ด๋Š” ์—ฐ์‚ฐ๋งŒ ์ˆ˜ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •๋ณด์†์‹ค ๋ฐœ์ƒํ–ˆ๋Š”๋ฐ 1๊ฐœ์˜ feature map์—์„œ RoI๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ด ๋ฌธ์ œ๋ฅผ ์–ด๋Š ์ •๋„ ํ•ด๊ฒฐํ•จ

Residual Feature Augmentation
Ratio - invariant Adatpive Pooling : ๋‹ค์–‘ํ•œ scale์˜ feature map ์ƒ์„ฑ (256 channel)

 

 

๋™์ผํ•œ scale๋กœ upsamplig ํ•œ ๋’ค ๊ฐ€์ค‘์น˜๋ฅผ ๋‘๊ณ  ํ•ฉ์นจ

728x90
Comments